We have retrieved the git repositories and not yet ingested them.
This task is about the actual ingestion using our loader-git.
Note:
- Like every other mirror/backup, it's stored at /srv/storage/space/mirrors/, under a dedicated root directory 'code.google.com' (in uffizi).
- /srv/storage/space/mirrors/code.google.com/sources/INDEX.filesystem to list all googlecode's repositories on disk.
Requirements:
- filtering the git repositories (we only have the INDEX.filesystem which lists of all googlecode repositories for now, be it of types git, svn or hg). There is a project.json in the same folder as the archive which contains the mention 'repoType' with possible value as either 'git', 'svn', or 'hg'.
- As we did for the googlecode's svn repositories, we need to reconstruct their url: https://<project-name>.googlecode.com/
- all git repositories are archive files (mostly zip). So, we either need to uncompress every archive first or as with the googlecode svn loader, let the worker uncompress first the archive in a temporary directory and then load the git repository.
- at last, generate a full_mapping.txt (mirroring the one from gitorious) mentioning <origin_url> <path-to-git-repository-tree-or-archive>.