The API exposed by the swh.model.git module has grown organically, rather than being designed to do something well contained and documented.
It doesn't return objects that are compatible with the swh.model.identifiers or the swh.storage API, which makes bugs like T685 possible as conversion steps are done in a bunch of different places, possibly in an incompatible way.
Plus, some of the features seem ad-hoc and confusing (for instance, the feature to ignore empty directories seems to remove them??).
Finally, the name of the module is misleading: while the current Software Heritage API somewhat aligns with git (we already differ in some ways, for instance we can handle empty directories), this is not a long-term feature and is bound to evolve.
As far as I can tell, the feature we want to expose is building the Software Heritage data structures from disk, in a shape ready to be sent to storage.
- compute Software Heritage contents from disk
- compute Software Heritage directory structures from disk, possibly ignoring some of the subdirectories (for instance, ignoring VCS directories or empty directories)
Is there anything else this module does that needs to be an exposed API?