diff --git a/README b/README index c8150b7..d34b0eb 100644 --- a/README +++ b/README @@ -1,62 +1,67 @@ SWH-loader-dir ============== # TL;DR WIP -> Main entry point is scratch/walking.py for the moment The Software Heritage Loader Dir is a tool and a library to walk a local Directory and inject into the SWH dataset all unknown contained files. Git sha1 computation -------------------- ### commit/revision sha1 git commit/revision computation: commit `size`\0 tree `sha1-git-tree-and-subtree-in-plain-hex-string` - parent `commit-parent` - author `author-name` <`author-email`> `author-date-ts` `author-date-offset` - committer `committer-name` <`committer-email`> `committer-date-ts` `committer-date-offset` + (parent `commit-parent`) + author `name` <`email`> `date-ts` `date-offset` + committer `name` <`email`> `date-ts` `date-offset` `commit-message` + Notes: -- no newline +- () denotes optional entry. Indeed, first commit does not contain any parent commit. +- empty line at the end of the commit message +- timestamp example: 1444054085 - date offset for example: +0200 ### directory/tree sha1 git directory/tree computation: tree `tree-size`\0 - \0 - \0 + \0... \0... Notes: - no newline separator between tree entries -- tree content size is the length of the content -- What's the entry order? byte order on the filename - -> `git mk-tree` does not need to presort the input -> git mktree creates a tree and store in a git file storage -> no go +- no empty newline at the end of the tree entries +- tree content header size is the length of the content +- The tree entries are ordered according to bytes in their properties. -permissions: +Possible permissions are: - 100644 - file - 40000 - directory - 100755 - executable file - 120000 - symbolink link -- 160000 - git link +- 160000 - git link (relative to submodule) ### content/file sha1 git content computation: blob `blob-size`\0 `blob-content` +Notes: +- no newline at the end of the blob content + Compress with DEFLATE and compute sha1