HomeSoftware Heritage

Prevent erroneous HashCollisions by using the same ctime for all rows.

This commit no longer exists in the repository. It may have been part of a branch which was deleted.

Description

Prevent erroneous HashCollisions by using the same ctime for all rows.

'swh_content_add' tries to avoid this issue with a DISTINCT clause
on the entire row; but it is useless because 'ctime' cells differ by
a few microseconds.
This commit ensures all ctime values are exactly the same, so they
are filtered out.

An alternative would be to change 'swh_content_add' to do:

select distinct on (sha1, sha1_git, sha256, blake2s256, length, status) sha1, sha1_git, sha256, blake2s256, length, status, ctime from tmp_content

instead of:

select distinct sha1, sha1_git, sha256, blake2s256, length, status, ctime from tmp_content

but this is more verbose and there's no good reason to call 'now()' for
every row.

Details

Provenance
vlorentzAuthored on Apr 8 2020, 10:30 AM
vlorentzPushed on Apr 8 2020, 10:44 AM
Differential Revision
D2977: Prevent erroneous HashCollisions by using the same ctime for all rows.
Build Status
Buildable 11660
Build 17687: test-and-buildJenkins console · Jenkins

Commit No Longer Exists

This commit no longer exists in the repository.