Page MenuHomeSoftware Heritage

Clean up the swh.model.git API
Closed, MigratedEdits Locked

Description

The API exposed by the swh.model.git module has grown organically, rather than being designed to do something well contained and documented.

It doesn't return objects that are compatible with the swh.model.identifiers or the swh.storage API, which makes bugs like T685 possible as conversion steps are done in a bunch of different places, possibly in an incompatible way.

Plus, some of the features seem ad-hoc and confusing (for instance, the feature to ignore empty directories seems to remove them??).

Finally, the name of the module is misleading: while the current Software Heritage API somewhat aligns with git (we already differ in some ways, for instance we can handle empty directories), this is not a long-term feature and is bound to evolve.

As far as I can tell, the feature we want to expose is building the Software Heritage data structures from disk, in a shape ready to be sent to storage.

  • compute Software Heritage contents from disk
  • compute Software Heritage directory structures from disk, possibly ignoring some of the subdirectories (for instance, ignoring VCS directories or empty directories)

Is there anything else this module does that needs to be an exposed API?

Revisions and Commits

Event Timeline

ignore empty directories seems to remove them??)

Yes, I forgot to mention that in the docstring.
To be fair with myself, i did mention it in the code though since i consider this ugly as well.

It was svn loader related... (svn checkouts/exports empty directories).
That module started there.

Anyway, it's due to some edge case for the ignore empty directory feature: one directory listing an empty directory...
Removing that empty directory, you have a new empty directory that needs to be ignored... (because it should not have been there in the first place).

I need to recheck but I think it's not used any longer (it was used for the svn loader's first implementation).