HomeSoftware Heritage

loader: Yield only modified objects in process_cvs_changesets

Description

loader: Yield only modified objects in process_cvs_changesets

Previously, after each revision replay all files and directories of the
CVS repository being loaded were collected and sent to the storage.
This is a real bottleneck in terms of loading performances as it delegates
the filtering of new objects to archive to the storage filtering proxy.

As we known exactly the set of paths that have been modified in a CVS
revision, prefer to do that filtering on the loader side and only
send modified objects to storage instead of the whole set of contents
and directories from the reconstructed filesystem.

This should greatly improve loading performance for large repositories
but also reduce loader memory consumption.

Details

Provenance
anlambertAuthored on Oct 13 2022, 6:00 PM
anlambertPushed on Oct 17 2022, 7:26 PM
Differential Revision
D8682: Improve CVS loader performances
Parents
rDLDCVSb976aa6a1f80: loader: Reconstruct repo filesystem incrementally at each revision
Branches
Unknown
Tags
Unknown
References
tag: v0.5.0
Build Status
Buildable 32346
Build 50661: test-and-buildJenkins console · Jenkins