Page MenuHomeSoftware Heritage

Lister engineering
Closed, ResolvedPublic

Description

Listers allow us to find new origins from sources of data we already know.

Instances of listers have the following characteristics:

  • a type, corresponding to the module where the lister is defined (e.g. swh.lister.github)
  • parameters, that allow sharing the same code for different sources (e.g. the URL of a forge, or of a webpage containing software)

Listers should be able to operate in two modes:

  • incremental, supposed to run frequently
  • full, supposed to run the first time then infrequently

Both modes of listers will:

  • for new origins
    • Create the origin
    • Create associated entities
    • Schedule the recurring task in the scheduler database and link it to the origin
  • for disappeared origins
    • Mark the origin as inactive
    • Unschedule the recurring task in the scheduler database

Lister can use ad-hoc storage facilities (e.g. their own database) to keep state between incrementals.