HomeSoftware Heritage

Improve PostgreSQL storage scheme for the `with-path-denormalized` flavor

Description

Improve PostgreSQL storage scheme for the with-path-denormalized flavor

Previous version was storing arrays of strings representing tuples for the
denormalized relations (dst and loc of the relation resp.). While that
simplified the check for duplicates, it turned out to be very inefficient
in terms of disk usage. The new version has two distinct lists if bigint
(ie. internal ids) for dst and loc resp. To check for duplicates the
lists should be zipped, and repeated tuples filtered.

Details

Provenance
aevisoAuthored on Oct 14 2021, 12:03 PM
aevisoPushed on Nov 24 2021, 1:45 PM
Differential Revision
D6473: Improve PostgreSQL storage scheme for the `with-path-denormalized` flavor
Parents
rDPROV584845d3715e: Add support to filter files a minimum size
Branches
Unknown
Tags
Unknown
Build Status
Buildable 25155
Build 39307: test-and-buildJenkins console · Jenkins