Implementation as per the notes in the linked task (up to the hedgedoc linked in the task).
Downstream loader adaptations required to be able to ingest the output of the lister are tracked in the following diffs:
- D8581: Create ContentLoader(BaseLoader) to deal with ListedOrigins with "file" visit_type
- D8584: Create DirectoryLoader(BaseLoader) to deal with "integrity" field (with or without version)
- run through docker to lift papercuts [1] [2] [3]
[1] guix
swh-lister_1 | [2022-10-03 08:05:49,405: INFO/ForkPoolWorker-1] Task swh.lister.nixguix.tasks.NixGuixListerTask[f58096ad-af9f-42fa-bc29-e4791f1a24e3] succeeded in 557.3408025280223s: {'pages': 21483, 'origins': 18936}
[2] P1467
[3] nixpkgs
swh-lister_1 | [2022-10-03 15:36:38,225: INFO/ForkPoolWorker-1] Task swh.lister.nixguix.tasks.NixGuixListerTask[b442f750-797d-4df8-af0e-a5426a669462] succeeded in 177.8664992809645s: {'pages': 31285, 'origins': 31218}
Related to T3781