HomeSoftware Heritage

pypi: write metadata on revisions instead of snapshots.

This commit no longer exists in the repository. It may have been part of a branch which was deleted.

Description

pypi: write metadata on revisions instead of snapshots.

Writing them on snapshot allowed us to write the raw metadata from the API,
but it causes a lot of duplication; after running for only a couple of months,
the metadata storage is already 700GB in size, mostly because of NPM
metadata, but also because of these (eg. many over 1MB each).

The metadata we wrote on snapshots was made of:

  • intrinsic metadata that PyPI extracted from the last upload
  • info on each file (sdist or otherwise)

The former we don't need to archive like this (as they are intrinsic),
and we keep loading the latter but only for source files and discard
extrinsic metadata for binary files, as they are not useful.

Details

Commit No Longer Exists

This commit no longer exists in the repository.