Page MenuHomeSoftware Heritage

loader.git: Improve GitLoaderFromArchive
ClosedPublic

Authored by anlambert on Dec 11 2018, 1:54 PM.

Details

Summary

While working on improving swh-web tests, I noticed a couple of issues with
GitLoaderFromArchive. That diff implements the following:

  • guess project name from archive name instead of path
  • add support for tar.gz tarballs
  • handle more tarball layouts
  • simplify test

Diff Detail

Repository
rDLDG Git loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

anlambert created this revision.Dec 11 2018, 1:54 PM
ardumont added inline comments.Dec 11 2018, 2:24 PM
swh/loader/git/tests/test_loader.py
128

Maybe rename to remove the zip notion?

135–136

Can you explicit the boolean value with its name?

I think it's to inhibit the tarball uncompression step, not completely sure though.

ardumont added inline comments.Dec 11 2018, 2:31 PM
swh/loader/git/utils.py
48

Is that a production behavior or a test one?

anlambert marked 3 inline comments as done.Dec 11 2018, 2:42 PM
anlambert added inline comments.
swh/loader/git/tests/test_loader.py
128

proposal: BaseGitLoaderFromArchiveTest

135–136

sure!

swh/loader/git/utils.py
48

The purpose here is to handle tarballs with an unexpected layout.

Ideally, when encoutering a tarball named testrepo.tgz, we expect the following layout in it:

testrepo/
├── .git
│   ├── branches
│   ├── COMMIT_EDITMSG
│   ├── config
│   ├── description
│   ├── HEAD
 ...

But some times we can have layout like this:

.
├── .git
│   ├── branches
│   ├── COMMIT_EDITMSG
│   ├── config
│   ├── description
│   ├── HEAD
 ...

or like this:

other-repo-name/
├── .git
│   ├── branches
│   ├── COMMIT_EDITMSG
│   ├── config
│   ├── description
│   ├── HEAD
 ...

The code below handles those cases. I can also remove it and indicate the expected layout in the docstring.

anlambert updated this revision to Diff 2545.Dec 11 2018, 2:44 PM

Update: Address some ardumont comments

ardumont accepted this revision.Dec 11 2018, 2:53 PM
ardumont added inline comments.
swh/loader/git/tests/test_loader.py
128

sounds good!

swh/loader/git/utils.py
48

Ideally, when encoutering a tarball named testrepo.tgz, we expect the following layout in it:

Ah yes, i think it was for the gitorious/googlecode git archives ingestion...

The code below handles those cases. I can also remove it and indicate the expected layout in the docstring.

Or Keep your code adaptation and move your comment and your 3 samples on the docstring ;)

This revision is now accepted and ready to land.Dec 11 2018, 2:53 PM
anlambert updated this revision to Diff 2547.Dec 11 2018, 3:11 PM

Update loader docstring

ardumont accepted this revision.Dec 11 2018, 3:29 PM
This revision was automatically updated to reflect the committed changes.