HomeSoftware Heritage

Add arch lister module (origins from archives).

Description

Add arch lister module (origins from archives).

After a first attempt with D7812 this one use a different strategy to
retrieve origins.

Fetch and extract "core.files.tar.gz", "extra.files.tar.gz" and "community.files.tar.gz" from archives.archlinux.org. That step ensure that we have a list of "official" packages.
Parse metadata from 'desc' file to build origins url.
Scrap the origin url to get artifacts metadata that list all versions of a package.

It also fetch and extract unofficial 'arm' packages from archlinuxarm.org but in this case we can not get all versions of an arm package.

Related T4233

Details

Provenance
franckbretAuthored on May 25 2022, 2:43 PM
franckbretPushed on Jun 17 2022, 9:23 AM
Differential Revision
D7894: Add arch lister module (origins from archives).
Parents
rDLS263db667d09c: Adapt maven lister to list canonical gh urls if any
Branches
Unknown
Tags
Unknown
Tasks
T4233: Ingest Arch Linux
Build Status
Buildable 29916
Build 46764: test-and-buildJenkins console · Jenkins