diff --git a/README b/README index 1f23b69..c8150b7 100644 --- a/README +++ b/README @@ -1,63 +1,62 @@ SWH-loader-dir ============== # TL;DR WIP -> Main entry point is scratch/walking.py for the moment The Software Heritage Loader Dir is a tool and a library to walk a local Directory and inject into the SWH dataset all unknown contained files. Git sha1 computation -------------------- ### commit/revision sha1 git commit/revision computation: commit `size`\0 tree `sha1-git-tree-and-subtree-in-plain-hex-string` parent `commit-parent` author `author-name` <`author-email`> `author-date-ts` `author-date-offset` committer `committer-name` <`committer-email`> `committer-date-ts` `committer-date-offset` `commit-message` Notes: - no newline - date offset for example: +0200 ### directory/tree sha1 git directory/tree computation: tree `tree-size`\0 - \0 <- 28 bytes - \0 <- 28 bytes + \0 + \0 Notes: -- permission is length 6 + space + \0 -> so 8 bytes -- sha1 is 20 bytes so one row is 28 bytes. -- tree-size = 28 * number of rows -- no newline separator -- What's the entry order? byte +- no newline separator between tree entries +- tree content size is the length of the content +- What's the entry order? byte order on the filename -> `git mk-tree` does not need to presort the input -> git mktree creates a tree and store in a git file storage -> no go permissions: - 100644 - file -- 040000 - directory +- 40000 - directory - 100755 - executable file - 120000 - symbolink link +- 160000 - git link ### content/file sha1 git content computation: blob `blob-size`\0 `blob-content` Compress with DEFLATE and compute sha1