HomeSoftware Heritage

swh.indexer.recompute: Add storage primary_key option for corruption check

Description

swh.indexer.recompute: Add storage primary_key option for corruption check

This no longer fetches metadata from contents. The contents passed as
parameter should be self-contained and in adequation with the
'primary_key' configuration option. As of now, that primary key is
'sha1'.

Unknown or corrupted contents are skipped.

Add some other improvments:

  • Improve docstrings
  • Align option names with _ as per usual conventions
  • Add a batch_size to retrieve contents' blobs
  • Add another batch_size for the update contents operations

Related T692

Details

Provenance
ardumontAuthored on Mar 3 2017, 12:57 PM
ardumontPushed on Mar 21 2017, 10:50 AM
Differential Revision
D186: Recompute class to trigger an add/update hash checksums in storage
Parents
rDCIDX51225e12cab2: swh.indexer.tasks: Open task to recompute checksums
Branches
Unknown
Tags
Unknown