HomeSoftware Heritage

rubygems: Use gems database dump to improve listing output

This commit no longer exists in the repository. It may have been part of a branch which was deleted.

Description

rubygems: Use gems database dump to improve listing output

Instead of using an undocumented rubygems HTTP endpoint that only
gives us the names of the gems, prefer to exploit the daily PostgreSQL
dump of the rubygems.org database.

It enables to list all gems but also all versions of a gem and its
release artifacts. For each relase artifact, the following info are
extracted: version, download URL, sha256 checksum, release date
plus a couple of extra metadata.

The lister will now set list of artifacts and list of metadata as extra
loader arguments when sending a listed origin to the scheduler database.
A last_update date is also computed which should ensure loading tasks
for rubygems will be scheduled only when new releases are available since
last loadings.

To be noted, the lister will spawn a temporary postgres instance so this
require the initdb executable from postgres server installation to be
available in the execution environment.

Related to T1777

Details

Provenance
anlambertAuthored on Oct 6 2022, 5:51 PM
anlambertPushed on Oct 7 2022, 5:02 PM
Differential Revision
D8639: rubygems: Use gems database dump to improve listing output
Tasks
T1777: Rubygems Lister
Build Status
Buildable 32169
Build 50375: test-and-buildJenkins console · Jenkins

Commit No Longer Exists

This commit no longer exists in the repository.