Page MenuHomeSoftware Heritage

Add diff features for class from_disk.Directory
Closed, MigratedEdits Locked

Description

The from_disk.Directory class can be used to generate, from an on-disk directory, all the objects that can be sent to the Software Heritage archive.
It can also be dynamically altered when paths got modified or removed to keep the model synchronized with the on-disk data.
It is notably used by the subversion loader to reconstruct a repository filesystem when replaying its revisions.

Recent changes in the subversion loader (D6950) calls for supporting diff features, aka computing the list of
added, modified or removed paths between two directories, for the the from_disk.Directory class
as it might be useful for other use cases.

Two possible ways to implement such features.

  • Add new methods in the class: one to indicate that we want to keep track of all the added, modified or removed paths in the directory since the call of the method and a second one to retrieved that list of paths
  • Add a function that computes the diff between two from_disk.Directory objects, maybe by adapting and reusing the implementation that can be found in swh/storage/algos/diff.py

Event Timeline

anlambert claimed this task.

Closing this as invalid as it already exists a method named collect in the merkle.MerkleNode class (base of from_disk.Directory) that does exactly what it is detailed in the task description.
Nevertheless, that method could be improved to give more flexibility in client code (T4633).