HomeSoftware Heritage

ContentPartitionIndexer: Do not index the same content multiple times at once.

Description

ContentPartitionIndexer: Do not index the same content multiple times at once.

self._index_contents was called multiple times in a loop with the same arguments,
except for the set of hashes to exclude.

It means that, if there were N pages of hashes to exclude, each content was
indexed N times; and the N-1 first iterations didn't even exclude all the
hashes they had to exclude.

Details

Provenance
vlorentzAuthored on Feb 1 2021, 2:41 PM
vlorentzPushed on Feb 1 2021, 3:02 PM
Differential Revision
D4982: ContentPartitionIndexer: Do not index the same content multiple times at once.
Parents
rDCIDX3baf8bb91978: Add a cli section in the doc
Branches
Unknown
Tags
Unknown
Build Status
Buildable 18917
Build 29314: test-and-buildJenkins console · Jenkins