In effect, this will allow to run 2 runners:
- one for recurring tasks
- one for the save code now
This should decrease the probability of the scheduling tasks for the save code now to be
stuck behind the main scheduler runner.
Related to T3367
Differential D5826
runner: Separate scheduling tasks with and without priority concerns Authored by ardumont on Jun 8 2021, 5:39 PM.
Details
In effect, this will allow to run 2 runners:
This should decrease the probability of the scheduling tasks for the save code now to be Related to T3367 tox
Diff Detail
Event TimelineComment Actions Build is green Patch application report for D5826 (id=20846)Could not rebase; Attempt merge onto 9f7ab8fcdc... Updating 9f7ab8f..b76c647 Fast-forward swh/scheduler/backend.py | 90 ++++++++++++++++++++++---------- swh/scheduler/celery_backend/config.py | 23 +++++++- swh/scheduler/celery_backend/runner.py | 89 +++++++++++++++---------------- swh/scheduler/cli/admin.py | 38 ++++++++++++-- swh/scheduler/cli/origin.py | 65 +++++++++++++++++++++++ swh/scheduler/interface.py | 19 +++++++ swh/scheduler/tests/test_celery_tasks.py | 14 +++-- 7 files changed, 252 insertions(+), 86 deletions(-) Changes applied before testcommit b76c647b4fedb6ad3811a2f3c034b996db7c2a79
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date: Tue Jun 8 17:36:28 2021 +0200
runner: Separate scheduling tasks with and without priority concern
Related to T3367
commit 974475fa08ebf9a31e68f89398633f97040f0d3e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date: Thu Jun 3 16:03:26 2021 +0200
send-to-celery: Add more options to allow scheduling of edge cases
In the non optimal case, we may want to trigger specific case (not-yet enabled origins,
origin from specific lister...).
Related to T3350
commit 370ec4d66da913b409784bc949db402392594b0d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Jun 2 15:59:15 2021 +0200
Direct scheduling of origin visits in celery
Summary:
This stack of changes builds up to a CLI endpoint allowing us to schedule origin
visits directly in Celery, bypassing the legacy scheduler entirely.
This has zero test coverage save from old tests still passing, which is already
something... It's being used on the actual production database to schedule
actual tasks for git, npm and pypi.
Included changes:
- Drop duplicate docstring from backend
- Make the origin visit scheduling cooldown configurable
(Cosmetic changes)
- Add a (longer) specific cooldown for failed origin visits
- Add a specific cooldown for notfound origins
Both of these changes prevent repeating visits on failing origins. This is
necessary because, as we're using a consistent ordering with respect to the
upstream information, we'd always be trying to load them, never reaching origins
further down the stack. Listers should eventually disable these origins.
- Add table sampling option to grab_next_visits
Running common operations on all git origins is pretty intense. Using
table sampling gives us the opportunity to at least schedule some jobs
in (decently small) time.
- Add a (very basic) scheduling policy for origins with no known last update
This is especially useful for pypi, as well as some git hosters that do not
provide the right info in their APIs. We will need to implement smarter
heuristics to avoid repeated uneventful visits on these origins.
- Split off the helper for available slots in a celery queue
This is needed for the send-to-celery subcommand as well, so split it off of the
runner module.
- Add a swh scheduler origin send-to-celery subcommand
Yes, finally!
Test Plan: obviously needs at least /some/ test coverage.
Reviewers: #reviewers
Subscribers: ardumont
Differential Revision: https://forge.softwareheritage.org/D5809See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/354/ for more details. Comment Actions Build is green Patch application report for D5826 (id=20886)Rebasing onto 9d2618db8f... Current branch diff-target is up to date. Changes applied before testcommit 091336179afad8c4f4b97ffed18644a076893efc
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date: Tue Jun 8 17:36:28 2021 +0200
runner: Separate scheduling tasks with and without priority concern
Related to T3367See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/358/ for more details. Comment Actions Build is green Patch application report for D5826 (id=20897)Rebasing onto 9d2618db8f... Current branch diff-target is up to date. Changes applied before testcommit 4a2adc01fcfe4a63bbf06ae406a87851a12b931b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date: Tue Jun 8 17:36:28 2021 +0200
runner: Separate scheduling tasks with and without priority concern
In effect, this will allow to run 2 runners:
- one for recurring tasks
- one for the save code now
This should decrease the probability of the scheduling tasks for the save code now to be
stuck behind the main scheduler runner.
Related to T3367See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/359/ for more details. Comment Actions Build is green Patch application report for D5826 (id=20908)Rebasing onto 21c4279b99... Current branch diff-target is up to date. Changes applied before testcommit 0bafdccd09333aae5bdb81e496f0a09eabe51b35
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date: Tue Jun 8 17:36:28 2021 +0200
runner: Separate scheduling tasks with and without priority concern
In effect, this will allow to run 2 runners:
- one for recurring tasks
- one for the save code now
This should decrease the probability of the scheduling tasks for the save code now to be
stuck behind the main scheduler runner.
Related to T3367See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/361/ for more details. Comment Actions Build is green Patch application report for D5826 (id=20917)Rebasing onto 21c4279b99... Current branch diff-target is up to date. Changes applied before testcommit f71a716f478ee8bfcf7f4e26f387768a89276deb
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date: Tue Jun 8 17:36:28 2021 +0200
runner: Separate scheduling tasks with and without priority concern
In effect, this will allow to run 2 runners:
- one for recurring tasks
- one for the save code now
This should decrease the probability of the scheduling tasks for the save code now to be
stuck behind the main scheduler runner.
Related to T3367See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/362/ for more details. Comment Actions Build is green Patch application report for D5826 (id=20925)Rebasing onto 21c4279b99... Current branch diff-target is up to date. Changes applied before testcommit c7707b5c836c3f58bace115eb398599a989845aa
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date: Tue Jun 8 17:36:28 2021 +0200
runner: Separate scheduling tasks with and without priority concern
In effect, this will allow to run 2 runners:
- one for recurring tasks
- one for the save code now
This should decrease the probability of the scheduling tasks for the save code now to be
stuck behind the main scheduler runner.
Related to T3367See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/363/ for more details. Comment Actions LGTM, still not a big fan of the usage of random in the tests ;), but otherwise, it matches what you explain to me this morning Comment Actions
\o/
lol, yeah but i'm not a big of hard-coding say the first element for example here. |