Page MenuHomeSoftware Heritage

Refine scheduling policy for origins with no known last update
ClosedPublic

Authored by olasd on Aug 26 2021, 4:50 PM.

Details

Summary

For origins that have never been visited, and for which we don't have a
queue position yet, we want to visit them in the order they've been
added.

Test Plan

this is only a marginal difference that was missed in the landed
implementation of this scheduling policy

Diff Detail

Repository
rDSCH Scheduling utilities
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6147 (id=22246)

Could not rebase; Attempt merge onto cc76a573e7...

Updating cc76a57..7cc37fa
Fast-forward
 swh/scheduler/backend.py               | 20 +++++++++++++--
 swh/scheduler/cli/origin.py            | 46 ++++++++++++++++++++++++++++++++++
 swh/scheduler/interface.py             |  3 +++
 swh/scheduler/tests/test_cli_origin.py | 34 +++++++++++++++++++++++++
 swh/scheduler/tests/test_scheduler.py  | 16 ++++++++++++
 5 files changed, 117 insertions(+), 2 deletions(-)
Changes applied before test
commit 7cc37fa233c72a4b6b8e362f563384045c657fc1
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jun 1 19:17:16 2021 +0200

    Refine scheduling policy for origins with no known last update
    
    For origins that have never been visited, and for which we don't have a
    queue position yet, we want to visit them in the order they've been
    added.

commit 2efad289833e971594833b9ed825b9acead8d254
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jun 1 20:04:11 2021 +0200

    Add a swh scheduler origin send-to-celery subcommand
    
    The subcommand bypasses the legacy task-based mechanism to directly send
    new origin visits to celery

commit 5e8007fdbfeb612ea394f97eeba25a1c4e529b7e
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Jun 1 15:48:05 2021 +0200

    Add table sampling option to grab_next_visits
    
    Running common operations on all git origins is pretty intense. Using
    table sampling gives us the opportunity to at least schedule some jobs
    in (decently small) time.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/448/ for more details.

olasd requested review of this revision.Aug 26 2021, 4:56 PM
This revision is now accepted and ready to land.Aug 26 2021, 5:50 PM