Page MenuHomeSoftware Heritage

cli.add_forge_now: Open `register-lister` with sensible defaults
ClosedPublic

Authored by ardumont on Dec 7 2022, 5:00 PM.

Details

Reviewers
olasd
Group Reviewers
Reviewers
Summary

This will ease scheduling of new add-forge-now requests, on:

  • staging: this will list a subset of disabled origins once
  • production: this will register recurring tasks (full, incremental if any) to list that new forge

So, the following would be enough to list appropriately in staging/production:

swh scheduler add-forge-now \
( --production | --staging \ # to list only enabled | disabled origins )
  register-lister \
    gitea \
    url=https://git.afpy.org/api/v1/

Related to https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674

Test Plan

Diff Detail

Repository
rDSCH Scheduling utilities
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 33184
Build 52025: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 52024: arc lint + arc unit

Unit TestsFailed

TimeTest
1,062 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.scheduler.tests.test_cli_add_forge_now::test_schedule_first_visits_cli[extra_cmd_args0]
mocker = <pytest_mock.plugin.MockerFixture object at 0x7fe19d5d3160> swh_scheduler = <swh.scheduler.backend.SchedulerBackend object at 0x7fe19d5d3a90> swh_scheduler_celery_app = <Celery celery.tests at 0x7fe19d894128>
1,233 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.scheduler.tests.test_cli_add_forge_now::test_schedule_first_visits_cli[extra_cmd_args1]
mocker = <pytest_mock.plugin.MockerFixture object at 0x7fe19d3b26a0> swh_scheduler = <swh.scheduler.backend.SchedulerBackend object at 0x7fe19d3b2e10> swh_scheduler_celery_app = <Celery celery.tests at 0x7fe19d894128>
772 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.scheduler.tests.test_cli_add_forge_now::test_schedule_first_visits_cli[extra_cmd_args2]
self = <AliasedGroup scheduler> args = ['add-forge-now', 'schedule-first-visits', '--type-name', 'test-git', '--environment=staging'] prog_name = 'scheduler', complete_var = None, standalone_mode = True
171 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.scheduler.tests.test_cli_add_forge_now::test_schedule_register_lister
swh_scheduler = <swh.scheduler.backend.SchedulerBackend object at 0x7fe19ca8bdd8> def test_schedule_register_lister(swh_scheduler):
9 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.scheduler.cli.utils::swh.scheduler.cli.utils.parse_options
View Full Test Results (4 Failed · 353 Passed · 1 Skipped)

Event Timeline

Build is green

Patch application report for D8940 (id=32213)

Rebasing onto 1c34e9837f...

Current branch diff-target is up to date.
Changes applied before test
commit 63ee5c010024de47cb0e1ecd1cb2e63b68cce37d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Dec 7 16:57:32 2022 +0100

    cli.add_forge_now: Open `register-lister` with sensible defaults
    
    This should ease the scheduling of a new forge:
    - on staging, this will list a subset of disabled origins
    - on production, this will register a recurring task to list a forge
    
    So, the following would be enough to list appropriately in staging/production:
swh scheduler add-forge-now \
  register-lister \
    list-gitea-full \
    ( --production | --staging \ # to list only enabled | disabled origins )
    url=https://git.afpy.org/api/v1/

```

Related to https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/581/ for more details.

So, I don't think we want to have to remember if there's full or incremental versions of the lister.

I would suggest only taking the forge type as argument (so, in your example, "gitea"), and pulling the list-{forge_type}-full / list-{forge_type}-incremental task types as needed. I guess these days we wouldn't even need both task types, just to pass incremental=true as argument to the task type?

For --staging, we just want to create a single oneshot full listing task. For --production, we want to create both tasks (if they exist) as recurring?

So, I don't think we want to have to remember if there's full or incremental versions
of the lister.

Yes, i did it that way without bothering with it to start small (small diff).

As not all listers have full/incremental implems (as per your own remarks too). I also
needed to go out soon, so i'd just cut to the chase ;) so we still could have something
to discuss upon.

I would suggest only taking the forge type as argument (so, in your example, "gitea"),
and pulling the list-{forge_type}-full / list-{forge_type}-incremental task types
as needed.

yes, sounds fine.

I guess these days we wouldn't even need both task types, just to pass
incremental=true as argument to the task type?

I'm not entirely sure all our listers respect that so i'd prefer checking if there is
the incremental/full tasks or not. That'd also simplify reading the logs, avoiding to
sort out whether it's got incremental argument to True or not.

For --staging, we just want to create a single oneshot full listing task.

Why bother with a full listing?

Since we now got the means to limit the listing, i'd use it. That's a faster feedback
loop and without stressing twice the upstream forge (one round for staging and another
for production at a relatively small intervals of time).

Maybe I need to configure the default to a more sensible defaults though. Like 3 pages
with 10 results (so that we can see that the pagination works too). Currently it's a tad
small (1 page of 15 or something).

For --production, we want to create both tasks (if they exist) as recurring?

Yes, if both tasks exist, let's schedule those as recurring.
Possibly with a time delta.

Like do a full listing now. And the next day, start the incremental one.

douardda added inline comments.
swh/scheduler/cli/add_forge_now.py
104

I'm not very fond of the staging/production flag here. It does not really make for the scheduler as is and is not very explicit on what/why is should be used.

Something like '--enable-origins/--disable-origins' (with the explanation similar to what's in the help message) would make more sense to me.

For --staging, we just want to create a single oneshot full listing task.

Why bother with a full listing?

Since we now got the means to limit the listing, i'd use it. That's a faster feedback
loop and without stressing twice the upstream forge (one round for staging and another
for production at a relatively small intervals of time).

Maybe I need to configure the default to a more sensible defaults though. Like 3 pages
with 10 results (so that we can see that the pagination works too). Currently it's a tad
small (1 page of 15 or something).

Sure, that's what I meant: for staging, schedule a oneshot full listing task, but with the limiting and "origin disabling" options ("full" as opposed to "incremental")

swh/scheduler/cli/add_forge_now.py
104

But that's really what it does: create the tasks with appropriate flags for staging, or for production. It doesn't just only change the enablement status of the origins.

swh/scheduler/cli/add_forge_now.py
104

And i had implemented as you wanted david yesterday and got convinced to adapt to this (for the swh scheduler add-forge-now schedule-first-visits cli. So now, i'm just unifying on that use.

Also, i'm now pondering whether those flags should go one level up...
swh scheduler add-forge-now --staging|--production <subcommand>

For --staging, we just want to create a single oneshot full listing task.

Why bother with a full listing?

Since we now got the means to limit the listing, i'd use it. That's a faster feedback
loop and without stressing twice the upstream forge (one round for staging and another
for production at a relatively small intervals of time).

Maybe I need to configure the default to a more sensible defaults though. Like 3 pages
with 10 results (so that we can see that the pagination works too). Currently it's a tad
small (1 page of 15 or something).

Sure, that's what I meant: for staging, schedule a oneshot full listing task, but with the limiting and "origin disabling" options ("full" as opposed to "incremental")

cool.

I missed the full/incremental issue. I'll adapt to that.

swh/scheduler/cli/add_forge_now.py
104

Yeah, the flag can probably move to the command group.

We could call it a "preset" and have --preset staging / --preset production (and potentially --preset something-else, in the future), instead of a flag.

swh/scheduler/cli/add_forge_now.py
104

why not --environment? (that's what i'm currently writing in my attempts to docstring-ify the commands now, that makes more sense to me than preset).

Adapt somewhat according to review (let jenkins builds stuff) [1]

[1] I encounter issue with my dev env (pytest, tox, pre-commit, the all shebang).

Build has FAILED

Patch application report for D8940 (id=32226)

Could not rebase; Attempt merge onto 1c34e9837f...

Updating 1c34e98..b6dd69a
Fast-forward
 swh/scheduler/cli/add_forge_now.py            |  87 ++++++++++++--
 swh/scheduler/cli/origin.py                   |   2 +-
 swh/scheduler/cli/task.py                     | 167 +++-----------------------
 swh/scheduler/cli/utils.py                    | 157 +++++++++++++++++++++++-
 swh/scheduler/tests/test_cli_add_forge_now.py |  56 ++++++++-
 5 files changed, 304 insertions(+), 165 deletions(-)
Changes applied before test
commit b6dd69ae764a478ab296b257356dae3e48ced118
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Dec 8 12:15:27 2022 +0100

    cli.add_forge_now: Rework according to review
    
    commit will be amended

commit 63ee5c010024de47cb0e1ecd1cb2e63b68cce37d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Dec 7 16:57:32 2022 +0100

    cli.add_forge_now: Open `register-lister` with sensible defaults
    
    This should ease the scheduling of a new forge:
    - on staging, this will list a subset of disabled origins
    - on production, this will register a recurring task to list a forge
    
    So, the following would be enough to list appropriately in staging/production:
swh scheduler add-forge-now \
  register-lister \
    list-gitea-full \
    ( --production | --staging \ # to list only enabled | disabled origins )
    url=https://git.afpy.org/api/v1/

```

Related to https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674
Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/582/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/582/console

Build has FAILED

Patch application report for D8940 (id=32227)

Could not rebase; Attempt merge onto 1c34e9837f...

Updating 1c34e98..37fd63e
Fast-forward
 swh/scheduler/cli/add_forge_now.py            |  87 ++++++++++++--
 swh/scheduler/cli/origin.py                   |   2 +-
 swh/scheduler/cli/task.py                     | 167 +++-----------------------
 swh/scheduler/cli/utils.py                    | 157 +++++++++++++++++++++++-
 swh/scheduler/tests/test_cli_add_forge_now.py |  56 ++++++++-
 5 files changed, 304 insertions(+), 165 deletions(-)
Changes applied before test
commit 37fd63e463d4d82dbe7c1277c9ebf18d2bf8c341
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Dec 8 12:15:27 2022 +0100

    cli.add_forge_now: Rework according to review
    
    commit will be amended

commit 63ee5c010024de47cb0e1ecd1cb2e63b68cce37d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Dec 7 16:57:32 2022 +0100

    cli.add_forge_now: Open `register-lister` with sensible defaults
    
    This should ease the scheduling of a new forge:
    - on staging, this will list a subset of disabled origins
    - on production, this will register a recurring task to list a forge
    
    So, the following would be enough to list appropriately in staging/production:
swh scheduler add-forge-now \
  register-lister \
    list-gitea-full \
    ( --production | --staging \ # to list only enabled | disabled origins )
    url=https://git.afpy.org/api/v1/

```

Related to https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674
Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/583/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/583/console

Build has FAILED

Patch application report for D8940 (id=32228)

Could not rebase; Attempt merge onto 1c34e9837f...

Updating 1c34e98..183e506
Fast-forward
 swh/scheduler/cli/add_forge_now.py            |  87 ++++++++++++--
 swh/scheduler/cli/origin.py                   |   2 +-
 swh/scheduler/cli/task.py                     | 167 +++-----------------------
 swh/scheduler/cli/utils.py                    | 157 +++++++++++++++++++++++-
 swh/scheduler/tests/test_cli_add_forge_now.py |  58 ++++++++-
 5 files changed, 306 insertions(+), 165 deletions(-)
Changes applied before test
commit 183e506c20285e8c54a5f8aaa92b03b8b0c0b237
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Dec 8 12:15:27 2022 +0100

    cli.add_forge_now: Rework according to review
    
    commit will be amended

commit 63ee5c010024de47cb0e1ecd1cb2e63b68cce37d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Dec 7 16:57:32 2022 +0100

    cli.add_forge_now: Open `register-lister` with sensible defaults
    
    This should ease the scheduling of a new forge:
    - on staging, this will list a subset of disabled origins
    - on production, this will register a recurring task to list a forge
    
    So, the following would be enough to list appropriately in staging/production:
swh scheduler add-forge-now \
  register-lister \
    list-gitea-full \
    ( --production | --staging \ # to list only enabled | disabled origins )
    url=https://git.afpy.org/api/v1/

```

Related to https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674
Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/584/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/584/console
  • cli.add_forge_now: Open register-lister with sensible defaults
  • cli.add_forge_now: Rework according to review

Build has FAILED

Patch application report for D8940 (id=32229)

Rebasing onto 1c34e9837f...

Current branch diff-target is up to date.
Changes applied before test
commit 77851263f7b5475e31670b827efcce7dd66695bb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Dec 8 12:15:27 2022 +0100

    cli.add_forge_now: Rework according to review
    
    commit will be amended

commit 63ee5c010024de47cb0e1ecd1cb2e63b68cce37d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Dec 7 16:57:32 2022 +0100

    cli.add_forge_now: Open `register-lister` with sensible defaults
    
    This should ease the scheduling of a new forge:
    - on staging, this will list a subset of disabled origins
    - on production, this will register a recurring task to list a forge
    
    So, the following would be enough to list appropriately in staging/production:
swh scheduler add-forge-now \
  register-lister \
    list-gitea-full \
    ( --production | --staging \ # to list only enabled | disabled origins )
    url=https://git.afpy.org/api/v1/

```

Related to https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674
Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/585/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/585/console

Amend according to discussion and fix tests

Remains one issue with my venv which refuses to see that new cli... (that's annoying)

Build is green

Patch application report for D8940 (id=32232)

Rebasing onto 1c34e9837f...

Current branch diff-target is up to date.
Changes applied before test
commit da8fe7c92fa1c6c9cca3b91b2a0096ab70e5dd96
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Dec 8 12:15:27 2022 +0100

    cli.add_forge_now: Rework according to review
    
    commit will be amended

commit 63ee5c010024de47cb0e1ecd1cb2e63b68cce37d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Dec 7 16:57:32 2022 +0100

    cli.add_forge_now: Open `register-lister` with sensible defaults
    
    This should ease the scheduling of a new forge:
    - on staging, this will list a subset of disabled origins
    - on production, this will register a recurring task to list a forge
    
    So, the following would be enough to list appropriately in staging/production:
swh scheduler add-forge-now \
  register-lister \
    list-gitea-full \
    ( --production | --staging \ # to list only enabled | disabled origins )
    url=https://git.afpy.org/api/v1/

```

Related to https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/586/ for more details.

Build is green

Patch application report for D8940 (id=32233)

Rebasing onto 1c34e9837f...

First, rewinding head to replay your work on top of it...
Applying: cli.add_forge_now: Open `schedule-first-visits` with sensible defaults
Using index info to reconstruct a base tree...
M	swh/scheduler/cli/origin.py
M	swh/scheduler/cli/utils.py
M	swh/scheduler/tests/test_cli_origin.py
Falling back to patching base and 3-way merge...
No changes -- Patch already applied.
Applying: cli.add_forge_now: Open `register-lister` with sensible defaults
Changes applied before test
commit 18d414a089e74c9f901ab1394c95729200eadcf6
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Dec 7 16:57:32 2022 +0100

    cli.add_forge_now: Open `register-lister` with sensible defaults
    
    This will ease scheduling of new add-forge-now requests, on:
    - staging: this will list a subset of disabled origins once
    - production: this will register recurring tasks (full, incremental if any) to list that
      new forge
    
    So, the following would be enough to list appropriately in staging/production:
swh scheduler add-forge-now \
( --production | --staging \ # to list only enabled | disabled origins )
  register-lister \
    gitea \
    url=https://git.afpy.org/api/v1/
```

Related to https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/587/ for more details.
ardumont edited the summary of this revision. (Show Details)

Adapt according to last irc discussion -> use --preset

swh/scheduler/cli/add_forge_now.py
48–53

Build is green

Patch application report for D8940 (id=32243)

Rebasing onto 1c34e9837f...

Current branch diff-target is up to date.
Changes applied before test
commit e71ce0ca797534611521d1d187a5beb0727785a7
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Dec 7 16:57:32 2022 +0100

    cli.add_forge_now: Open `register-lister` with sensible defaults
    
    This will ease scheduling of new add-forge-now requests, on:
    - staging: this will list a subset of disabled origins once
    - production: this will register recurring tasks (full, incremental if any) to list that
      new forge
    
    This also unifies the previous subcommand schedule-first-visits with the --preset flag.
    
    So, the following would be enough to list appropriately in staging/production:
swh scheduler add-forge-now \
( --preset [production|staging] \ # to enable a pre-defined set of rules )
  register-lister \
    gitea \
    url=https://git.afpy.org/api/v1/
```

Related to https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/588/ for more details.
swh/scheduler/cli/add_forge_now.py
48–53

*thumbs up*

This revision is now accepted and ready to land.Dec 9 2022, 2:54 PM