HomeSoftware Heritage

npm: write metadata on revisions instead of snapshots.

This commit no longer exists in the repository. It may have been part of a branch which was deleted.

Description

npm: write metadata on revisions instead of snapshots.

Writing them on snapshot allowed us to write the raw metadata from the API,
but it causes a lot of duplication; after running for only a couple of months,
the metadata storage is already 700GB in size, mostly because of these
(eg. there are 150k over 1MB each).

The metadata we wrote on snapshots was made of:

  • a 'versions' dict, whose content is moved to revisions
  • a 'time' dict, with one timestamp per version, which is used as the data of revision objects
  • 'dist-tags', which is currently ignored, but should be converted to ALIAS branches in a future commit.
  • a '_rev' property, which is internal to NPM, so not useful to archive
  • everything else can be recomputed from the metadata of the latest version.

Details

Provenance
vlorentzAuthored on Oct 5 2020, 2:26 PM
vlorentzPushed on Oct 6 2020, 9:54 AM
Differential Revision
D4142: npm: write metadata on revisions instead of snapshots.
Build Status
Buildable 15912
Build 24488: test-and-buildJenkins console · Jenkins

Commit No Longer Exists

This commit no longer exists in the repository.