Page MenuHomeSoftware Heritage

Make GitHub lister use celery tasks instead of the command-line
Closed, ResolvedPublic

Description

The re-engineering of the GitHub lister implies using celery tasks for the "incremental" and "full" operations, so they can be integrated in the scheduler.

Specifically for the GitHub lister, three kinds of tasks are implied, divided in two queues (the incremental operation needs to be able to bypass the queue for the full operation):

  • swh_lister_github_incremental queue
    • incremental task maps to the current ghlister catchup operation
  • swh_lister_github_full queue
    • full "meta"-task that schedules range updates
    • update-range task that maps to the current ghlister list <start>-<stop> operation which takes a range of ids, and can be parallelized.

Event Timeline

olasd updated the task description. (Show Details)
olasd updated the task description. (Show Details)
olasd removed olasd as the assignee of this task.

The GitHub lister now provides a tasks module that can be deployed to workers using the standard puppet machinery.

A swh.scheduler recurring task has been set to run each day for an incremental run.