diff --git a/README b/README index 71ca5e6..1f23b69 100644 --- a/README +++ b/README @@ -1,58 +1,63 @@ SWH-loader-dir ============== +# TL;DR + +WIP -> Main entry point is scratch/walking.py for the moment + + The Software Heritage Loader Dir is a tool and a library to walk a local Directory and inject into the SWH dataset all unknown contained files. Git sha1 computation -------------------- ### commit/revision sha1 git commit/revision computation: commit `size`\0 tree `sha1-git-tree-and-subtree-in-plain-hex-string` parent `commit-parent` author `author-name` <`author-email`> `author-date-ts` `author-date-offset` committer `committer-name` <`committer-email`> `committer-date-ts` `committer-date-offset` `commit-message` Notes: - no newline - date offset for example: +0200 ### directory/tree sha1 git directory/tree computation: tree `tree-size`\0 \0 <- 28 bytes \0 <- 28 bytes Notes: - permission is length 6 + space + \0 -> so 8 bytes - sha1 is 20 bytes so one row is 28 bytes. - tree-size = 28 * number of rows - no newline separator - What's the entry order? byte -> `git mk-tree` does not need to presort the input -> git mktree creates a tree and store in a git file storage -> no go permissions: - 100644 - file - 040000 - directory - 100755 - executable file - 120000 - symbolink link ### content/file sha1 git content computation: blob `blob-size`\0 `blob-content` Compress with DEFLATE and compute sha1