The git loader can now discover submodules while loading a repository.
That process works the following way:
- Before sending a new directory to archive in the storage, check if it has a .gitmodules file in its entries and add the tuple (directory_id, content_sha1git) in a global set if it is the case.
- During the post_load operation, process each discovered .gitmodules file the following way:
- retrieve content metadata to get sha1 checksum of file
- retrieve .gitmodules content bytes in objstorage from sha1
- parse .gitmodules file content
- for each submodule definition:
- get git commit id associated to submodule path
- check if git commit has been archived by SWH
- if not, add the submodule repository URL in a set
- for each submodule detected as not archived or partially archived, create a one shot git loading task with high priority in the scheduler database