Page MenuHomeSoftware Heritage

cli.task: Migrate scheduler cli to latest storage change on iter_origins
ClosedPublic

Authored by ardumont on Sat, Aug 1, 10:04 AM.

Details

Summary
  • Drop --max-id, --min-id cli parameter which can no longer work
  • Expose --page-token and --limit instead

Related to T645

Test Plan

tox

Diff Detail

Repository
rDSCH Scheduling utilities
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Sat, Aug 1, 10:04 AM
ardumont edited the summary of this revision. (Show Details)Sat, Aug 1, 10:06 AM

Build has FAILED

Patch application report for D3682 (id=12964)

Rebasing onto 849d063758...

Current branch diff-target is up to date.
Changes applied before test
commit e48885ce005c8bbf27bf3fc85883489248af7a31
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 10:03:23 2020 +0200

    cli.task: Migrate scheduler cli to latest storage change on iter_origins
    
    Related to T645

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/50/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/50/console

vlorentz added inline comments.
swh/scheduler/cli/task.py
292–294

why not use islice directly?

383

wrong order of args btw

vlorentz added a comment.EditedSat, Aug 1, 10:09 AM

.

swh/scheduler/cli/task.py
372–376

I don't understand the issue. Just change the arguments to take the page_token as argument instead of min_id

ardumont added inline comments.Sat, Aug 1, 10:12 AM
swh/scheduler/cli/task.py
292–294

yeah, i initially took it from the doc on itertools [1] and then thought to use it directly the generator...

[1] https://docs.python.org/3/library/itertools.html#itertools-recipes

383

because my brain wanted to use islice all along ;)

ardumont edited the summary of this revision. (Show Details)Sat, Aug 1, 10:13 AM
ardumont added inline comments.
swh/scheduler/cli/task.py
372–376

well for one, iter_origins does not return any page_token after that...

vlorentz added inline comments.Sat, Aug 1, 10:33 AM
swh/scheduler/cli/task.py
372–376

Hmmm...

We could write a variant of iter_origins that yields tokens in addition to the origins.

ardumont added inline comments.Sat, Aug 1, 10:35 AM
swh/scheduler/cli/task.py
372–376

or drop iter_origins and use directly storage.origin_list which does just that...?

vlorentz added inline comments.Sat, Aug 1, 10:37 AM
swh/scheduler/cli/task.py
372–376

yeah, indeed

ardumont added inline comments.Sat, Aug 1, 10:39 AM
swh/scheduler/cli/task.py
372–376

that'd simplify even further D3681 either by:

  • dropping iter_origins altogether
  • simplifying its definition to reuse directly swh.core.api.classes.stream_results (and stop exposing the page_token)
ardumont planned changes to this revision.Sat, Aug 1, 10:46 AM
ardumont updated this revision to Diff 12965.Sat, Aug 1, 11:04 AM
  • Amend according to discussions
  • Add missing test which fails due to a most probably wrong origin-list implementation for the in-memory storage
ardumont planned changes to this revision.Sat, Aug 1, 11:04 AM
ardumont edited the summary of this revision. (Show Details)

Build has FAILED

Patch application report for D3682 (id=12965)

Rebasing onto 849d063758...

Current branch diff-target is up to date.
Changes applied before test
commit e4e94a48f184164e015f173027a3c471dfc21389
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 10:03:23 2020 +0200

    cli.task: Migrate scheduler cli to latest storage change on iter_origins
    
    Related to T645

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/51/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/51/console

ardumont updated this revision to Diff 12968.EditedSat, Aug 1, 11:27 AM

Add scenario with page-token

tests need storage > v0.11.8 to pass

ardumont edited the test plan for this revision. (Show Details)Sat, Aug 1, 11:28 AM

Build has FAILED

Patch application report for D3682 (id=12968)

Rebasing onto 849d063758...

Current branch diff-target is up to date.
Changes applied before test
commit 95882a4212324f828a13c19ddc0ed8284e93362b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 10:03:23 2020 +0200

    cli.task: Migrate scheduler cli to latest storage change on iter_origins
    
    Related to T645

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/53/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/53/console

ardumont marked 2 inline comments as done.Sat, Aug 1, 2:49 PM
ardumont edited the summary of this revision. (Show Details)Sat, Aug 1, 7:03 PM
ardumont edited the test plan for this revision. (Show Details)
ardumont updated this revision to Diff 12972.Sat, Aug 1, 7:03 PM

Fix tests appropriately

Build has FAILED

Patch application report for D3682 (id=12972)

Rebasing onto 849d063758...

Current branch diff-target is up to date.
Changes applied before test
commit ba058084f2e403a9c13371b51eff629f228ea23a
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 10:03:23 2020 +0200

    cli.task: Migrate scheduler cli to latest storage change on iter_origins
    
    Related to T645

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/54/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/54/console

ardumont edited the test plan for this revision. (Show Details)Sat, Aug 1, 7:07 PM
ardumont updated this revision to Diff 12973.Sat, Aug 1, 7:10 PM

Fix flake8 warnings, turns out we can compose f and r prefix for strings (cool)

tests still failing since they still need a storage bump

Build has FAILED

Patch application report for D3682 (id=12973)

Rebasing onto 849d063758...

Current branch diff-target is up to date.
Changes applied before test
commit 7db78dd9bf7a3cdba950094ce850fb6144dedbaa
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 10:03:23 2020 +0200

    cli.task: Migrate scheduler cli to latest storage change on iter_origins
    
    Related to T645

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/55/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/55/console

vlorentz accepted this revision.Mon, Aug 3, 9:13 AM

It would be nice to print the token from time to time so the process can be resumed if interrupted, but that's good enough for now

This revision is now accepted and ready to land.Mon, Aug 3, 9:13 AM

It would be nice to print the token from time to time so the process can be resumed if interrupted, but that's good enough for now

Yeah, it hit me during the night that i forgot that.
How do you envision this, is a logger.info instruction enough?

I was thinking of print(), but that would work too

I was thinking of print(), but that would work too

yeah, ok but now i'm wondering where.
I suppose it's common enough it should go in swh.core.api.classes.stream_results

If not, then i suppose, this can go in the iter_origins (from the storage diff)?

ardumont updated this revision to Diff 12989.Mon, Aug 3, 12:11 PM

Adapt according to review/discussion, inline iter_origins within the scheduler
so it displays the page_token during pagination.

Tests should be fine now as latest storage got bumped

ardumont edited the test plan for this revision. (Show Details)Mon, Aug 3, 12:12 PM

Build is green

Patch application report for D3682 (id=12989)

Rebasing onto 849d063758...

Current branch diff-target is up to date.
Changes applied before test
commit 2cf14951a7fc4f2a68cae9e80a9f352f6e27495c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 10:03:23 2020 +0200

    cli.task: Migrate scheduler cli to latest storage change on iter_origins
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/56/ for more details.

vlorentz added inline comments.Mon, Aug 3, 12:14 PM
swh/scheduler/cli/task.py
295

"Eventually" does not mean "éventuellement", but "finalement" ;)

Use "optionally" instead

ardumont updated this revision to Diff 12990.Mon, Aug 3, 12:18 PM

Rework docstring sentences
(thx for the heads up)

Build is green

Patch application report for D3682 (id=12990)

Rebasing onto 849d063758...

Current branch diff-target is up to date.
Changes applied before test
commit 642620845ac21c4b3fbb0aae69befc87c5ac64be
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Aug 1 10:03:23 2020 +0200

    cli.task: Migrate scheduler cli to latest storage change on iter_origins
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/57/ for more details.

vlorentz accepted this revision.Mon, Aug 3, 12:24 PM