This stack of changes builds up to a CLI endpoint allowing us to schedule origin
visits directly in Celery, bypassing the legacy scheduler entirely.
This has zero test coverage save from old tests still passing, which is already
something... It's being used on the actual production database to schedule
actual tasks for git, npm and pypi.
Included changes:
- Drop duplicate docstring from backend
- Make the origin visit scheduling cooldown configurable
(Cosmetic changes)
- Add a (longer) specific cooldown for failed origin visits
- Add a specific cooldown for notfound origins
Both of these changes prevent repeating visits on failing origins. This is
necessary because, as we're using a consistent ordering with respect to the
upstream information, we'd always be trying to load them, never reaching origins
further down the stack. Listers should eventually disable these origins.
- Add table sampling option to grab_next_visits
Running common operations on all git origins is pretty intense. Using
table sampling gives us the opportunity to at least schedule some jobs
in (decently small) time.
- Add a (very basic) scheduling policy for origins with no known last update
This is especially useful for pypi, as well as some git hosters that do not
provide the right info in their APIs. We will need to implement smarter
heuristics to avoid repeated uneventful visits on these origins.
- Split off the helper for available slots in a celery queue
This is needed for the send-to-celery subcommand as well, so split it off of the
runner module.
- Add a swh scheduler origin send-to-celery subcommand
Yes, finally!