Page MenuHomeSoftware Heritage

loader.git: Improve GitLoaderFromArchive
ClosedPublic

Authored by anlambert on Dec 11 2018, 1:54 PM.

Details

Summary

While working on improving swh-web tests, I noticed a couple of issues with
GitLoaderFromArchive. That diff implements the following:

  • guess project name from archive name instead of path
  • add support for tar.gz tarballs
  • handle more tarball layouts
  • simplify test

Diff Detail

Repository
rDLDG Git loader
Branch
git-loader-from-archive-improvements
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 2971
Build 3794: tox-on-jenkinsJenkins
Build 3793: arc lint + arc unit

Event Timeline

swh/loader/git/tests/test_loader.py
128

Maybe rename to remove the zip notion?

135–136

Can you explicit the boolean value with its name?

I think it's to inhibit the tarball uncompression step, not completely sure though.

swh/loader/git/utils.py
47

Is that a production behavior or a test one?

anlambert added inline comments.
swh/loader/git/tests/test_loader.py
128

proposal: BaseGitLoaderFromArchiveTest

135–136

sure!

swh/loader/git/utils.py
47

The purpose here is to handle tarballs with an unexpected layout.

Ideally, when encoutering a tarball named testrepo.tgz, we expect the following layout in it:

testrepo/
├── .git
│   ├── branches
│   ├── COMMIT_EDITMSG
│   ├── config
│   ├── description
│   ├── HEAD
 ...

But some times we can have layout like this:

.
├── .git
│   ├── branches
│   ├── COMMIT_EDITMSG
│   ├── config
│   ├── description
│   ├── HEAD
 ...

or like this:

other-repo-name/
├── .git
│   ├── branches
│   ├── COMMIT_EDITMSG
│   ├── config
│   ├── description
│   ├── HEAD
 ...

The code below handles those cases. I can also remove it and indicate the expected layout in the docstring.

Update: Address some ardumont comments

ardumont added inline comments.
swh/loader/git/tests/test_loader.py
128

sounds good!

swh/loader/git/utils.py
47

Ideally, when encoutering a tarball named testrepo.tgz, we expect the following layout in it:

Ah yes, i think it was for the gitorious/googlecode git archives ingestion...

The code below handles those cases. I can also remove it and indicate the expected layout in the docstring.

Or Keep your code adaptation and move your comment and your 3 samples on the docstring ;)

This revision is now accepted and ready to land.Dec 11 2018, 2:53 PM
This revision was automatically updated to reflect the committed changes.