Page MenuHomeSoftware Heritage

send-to-celery: Adapt to schedule from lister name & instance_name
ClosedPublic

Authored by ardumont on Dec 6 2022, 12:25 PM.

Details

Summary

This allows to bypass the lister id retrieval step using directly the name and instance
name of the lister to discover the uuid.

This also drops the --lister-uuid flag which is somewhat difficult to use.

Related to https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674

Test Plan

Add new test ongoing

Diff Detail

Repository
rDSCH Scheduling utilities
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 33098
Build 51883: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 51882: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8922 (id=32137)

Rebasing onto ff75e742ee...

Current branch diff-target is up to date.
Changes applied before test
commit e6b13bdf102262f91a94296f269967e55383c67f
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Dec 6 12:24:33 2022 +0100

    scheduler: Open `swh scheduler lister` command to retrieve lister id
    
    Example use case:
swhscheduler@scheduler0:~/addforgenow$ swh scheduler \
  --url http://scheduler0.internal.staging.swh.network:5008/ \
  lister \
  --name gitea --instance-name git.afpy.org
d07d1c90-5016-4ab6-91ac-3300f8eb4fc6
```

Related to https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/570/ for more details.

As I mentioned on IRC, I think we should do that "join" directly in swh scheduler origin send-to-celery (and have it error out if the lister name/instance provided don't match an existing lister); using the uuid was an "easy hack" to extend the API of grab_next_visits, but using lister name/instance in the CLI interface makes more sense.

And we should keep the argument names consistent with other endpoints that do the same filtering in that command subgroup as well.

To adapt according to suggestion (already on it).

Adapt according to suggestion (still trying to determine why the build fails)

ardumont retitled this revision from scheduler: Open `swh scheduler lister` command to retrieve lister id to scheduler: Adapt send-to-celery to allow scheduling from lister name.Dec 6 2022, 2:50 PM
ardumont edited the summary of this revision. (Show Details)
ardumont edited the test plan for this revision. (Show Details)
ardumont retitled this revision from scheduler: Adapt send-to-celery to allow scheduling from lister name to Adapt send-to-celery cli to allow scheduling from simpler lister info.Dec 6 2022, 2:56 PM
ardumont edited the summary of this revision. (Show Details)

Build was aborted

Patch application report for D8922 (id=32145)

Rebasing onto ff75e742ee...

Current branch diff-target is up to date.
Changes applied before test
commit d38f916ac5a02004276658c0aa51bc24146ff6f0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Dec 6 12:24:33 2022 +0100

    scheduler: Adapt send-to-celery to allow scheduling from lister name
    
    This is somewhat simpler to scheduler than to lookup for the lister id.
    
    Related to https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/571/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/571/console

ardumont edited the summary of this revision. (Show Details)

Amend commit message and fix tests

Build is green

Patch application report for D8922 (id=32146)

Rebasing onto ff75e742ee...

Current branch diff-target is up to date.
Changes applied before test
commit 774d8d34af3a06cc7fc77036698b210dca49b9c6
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Dec 6 12:24:33 2022 +0100

    Adapt send-to-celery cli to allow scheduling from simpler lister info
    
    This is simpler to schedule origins from a given forge when we are allowed to provide
    both the lister name and its instance name. Currently, we are required to lookup the
    lister uuid first (through psql) and then trigger the cli appropriately.
    
    Related to https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/572/ for more details.

swh/scheduler/backend.py
527

I expected this to be covered but the lister i used (in the test) has no instance name ¯\_(ツ)_/¯.

swh/scheduler/cli/origin.py
219

roh, i forgot to drop this...
as i implemented it in the backend in the end.

So, instead of adding more stuff to the grab_next_visits signature, I would have suggested just calling lister_get in swh/scheduler/cli/origin.py to get the lister uuid, and replacing the --lister-uuid CLI argument with --lister-name and --lister-instance-name (so, only changing the CLI function).

Does it seem like we're going to use these arguments in another caller of grab_next_visits?

swh/scheduler/cli/origin.py
219

So, instead of adding more stuff to the grab_next_visits signature, I would have
suggested just calling lister_get in swh/scheduler/cli/origin.py to get the lister
uuid, and replacing the --lister-uuid CLI argument with --lister-name and
--lister-instance-name (so, only changing the CLI function).

lol, like this "forgotten dropped implem" ^ then (my first implem' and then somehow i
went too deep probalby due to tests...).

Does it seem like we're going to use these arguments in another caller of grab_next_visits?

That's a good question. I think not.

Does it seem like we're going to use these arguments in another caller of grab_next_visits?

That's a good question. I think not.

Maybe we would, actually. For instance, it might make sense to use these options to give different scheduling weights for github/gitlab/... origins in the recurrent visit scheduler.

Either way, the uuid argument in the cli endpoint should go away!

Drop --lister-uuid flag to the benefit of using --lister-name and --lister-instance-name

ardumont retitled this revision from Adapt send-to-celery cli to allow scheduling from simpler lister info to send-to-celery: Adapt to schedule from lister name & instance_name.Dec 6 2022, 4:20 PM
ardumont edited the summary of this revision. (Show Details)

Build has FAILED

Patch application report for D8922 (id=32147)

Rebasing onto ff75e742ee...

Current branch diff-target is up to date.
Changes applied before test
commit a61b0e1e54bb7b3ce6ddeccb986e0270b860b277
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Dec 6 12:24:33 2022 +0100

    send-to-celery: Adapt to schedule from lister name & instance_name
    
    This allows to bypass the lister id retrieval step using directly the name and instance
    name of the lister to discover the uuid.
    
    This also drops the --lister-uuid flag which is somewhat difficult to use.
    
    Related to https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/573/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/573/console

Build is green

Patch application report for D8922 (id=32149)

Rebasing onto ff75e742ee...

Current branch diff-target is up to date.
Changes applied before test
commit a7769639df76bd2de93911cf10fb9e50d7d15b64
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Dec 6 12:24:33 2022 +0100

    send-to-celery: Adapt to schedule from lister name & instance_name
    
    This allows to bypass the lister id retrieval step using directly the name and instance
    name of the lister to discover the uuid.
    
    This also drops the --lister-uuid flag which is somewhat difficult to use.
    
    Related to https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/574/ for more details.

Does it seem like we're going to use these arguments in another caller of grab_next_visits?

That's a good question. I think not.

Maybe we would, actually. For instance, it might make sense to use these options to give different scheduling weights for github/gitlab/... origins in the recurrent visit scheduler.

I've opened another diff for it [1]

[1] D8928

This revision is now accepted and ready to land.Dec 6 2022, 5:53 PM