That diff contains two commits that should greatly improve the loading
of large CVS repositories in terms of performance.
loader: Reconstruct repo filesystem incrementally at each revision Instead of creating a from_disk.Directory instance after each replayed CVS revision by recursively scanning all directories of the repository, prefer to have a single one as class member kept synchronized with the recontructed filesystem after each revision replay. This should improve loader in terms of performance, especially when delaing with large repositories.
loader: Yield only modified objects in process_cvs_changesets Previously, after each revision replay all files and directories of the CVS repository being loaded were collected and sent to the storage. This is a real bottleneck in terms of loading performances as it delegates the filtering of new objects to archive to the storage filtering proxy. As we known exactly the set of paths that have been modified in a CVS revision, prefer to do that filtering on the loader side and only send modified objects to storage instead of the whole set of contents and directories from the reconstructed filesystem. This should greatly improve loading performance for large repositories but also reduce loader memory consumption.