Page MenuHomeSoftware Heritage

vault: add a git fast-import cooker
ClosedPublic

Authored by seirl on Mar 10 2017, 6:06 PM.

Diff Detail

Repository
rDSTO Storage manager
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 754
Build 1013: Software Heritage Python tests
Build 1012: arc lint + arc unit

Event Timeline

fix timestamp utc offset

swh/storage/vault/cookers/revision_git.py
40

I'm using this to debug while this diff is a wip, i'll remove that later.

swh/storage/vault/cookers/__init__.py
1

If this was created in 2016, this should be a range 2016-2017.

swh/storage/vault/cookers/revision_git.py
13

Documenting what this class does would be great.

Proposal: Generates git fast-import's mixed command/data stream as revision bundle.

30

Maybe add this as a TODO to find those more easily in the future.

37

a nit, you can drop the list wrapping around the fastexport_log call.

40

ok

54

I don't get what that means, can you please add some docstrings?

54

Ok, i think i got it.
It's computing file/blob commands for the git fast-import to reproduce on a new empty git repository.

Still some docstrings would be awesome.
Maybe renaming it to _compute_blob_commands or something would be better?

58

same here _compute_directory_commands?

73

_toposort as this is a private function?

78

curious me, why do you need the list wrapping around remaining.items()?

86

And this is computing revision/commit commands for the git fastimport command to reproduce on a new empty git repository.

Maybe _compute_revision_commands as the method name?

89

_compute_file_commands?

115

_compute_revision_command?

swh/storage/vault/cookers/revision_git.py
54

I don't get what that means, can you please add some docstrings?

Thanks for the comments, I was already adding some docstrings and I'll change the function names like you suggest.

I'm also pretty sure that some of these should be factored out (toposort and most of the functions that work on a log), maybe in swh-model, because they'll be also needed to do the snapshot export. I haven't figured out exactly how for now so I put everything here to be able to work on it.

swh/storage/vault/cookers/__init__.py
1

It wasn't. Well, it was a refactor of something that I then completely rewrote. Anyway, this content is not even copyrightable.

swh/storage/vault/cookers/revision_git.py
30

I'm going to refactor the cook() functions in the base class.

rebase onto refactoring diff

requirements.txt
1–6

I need to :sort that

swh/storage/vault/cookers/revision_git.py
78

because i'm modifying remaining inside the loop.

removing it gives:
RuntimeError: dictionary changed size during iteration

swh/storage/vault/cookers/revision_git.py
78

Yes! Thanks.

rebase, add docstrings and rename functions

seirl marked 7 inline comments as done.

remove useless list()

seirl marked 12 inline comments as done.Mar 13 2017, 4:58 PM

The last nitpicks i see is that we should use content for blob and revision for commit in the swh methods.
You did use directory instead of tree :D

In D190#3885, @ardumont wrote:

The last nitpicks i see is that we should use content for blob and revision for commit in the swh methods.
You did use directory instead of tree :D

I'm not sure about that. For instance, we're not creating "revision commands" but "commit commands from a revision". We don't compute "content commands", but "blob commands from the contents in a directory". I tried to stay consistent (what's on git side uses git terminology, and what's on swh side uses swh terminology) but if you see specific examples where this is not respected I can change them.

I'm not sure about that. For instance, we're not creating "revision commands" but "commit commands from a revision". We don't compute "content commands", but "blob commands from the contents in a directory". I tried to stay consistent (what's on git side uses git terminology, and what's on swh side uses swh terminology) but if you see specific examples where this is not respected I can change them.

Yes, fair enough. I was asking myself that later on.

This revision is now accepted and ready to land.Mar 13 2017, 6:11 PM
This revision was automatically updated to reflect the committed changes.