diff --git a/README b/README index d34b0eb..9a23a8f 100644 --- a/README +++ b/README @@ -1,67 +1,102 @@ SWH-loader-dir ============== -# TL;DR - -WIP -> Main entry point is scratch/walking.py for the moment - - The Software Heritage Loader Dir is a tool and a library to walk a local Directory and inject into the SWH dataset all unknown contained files. +Configuration sample +-------------------- + +loader-dir.ini: + + [main] + + dir_path = /path/to/some/directory + + # synthetic origin + origin_url = origin-url + branch = master + authority_id = 1 + validity = 2015-01-01 00:00:00+00 + + # synthetic revision + revision_author_name = swh author + revision_author_email = swh@inria.fr + revision_author_date = 1444054085 + revision_author_offset = +0200 + revision_committer_name = swh committer + revision_committer_email = swh@inria.fr + revision_committer_date = 1444054085 + revision_committer_offset = +0200 + revision_type = tar + revision_message = synthetic revision message + + # synthetic release + release_name = v0.0.1 + release_date = 1444054085 + release_offset = +0200 + release_author_name = swh author + release_author_email = swh@inria.fr + release_comment = synthetic release + +Run +--- + +./bin/swh-loader-dir /path/to/loader-dir.ini + Git sha1 computation -------------------- ### commit/revision sha1 git commit/revision computation: commit `size`\0 tree `sha1-git-tree-and-subtree-in-plain-hex-string` (parent `commit-parent`) author `name` <`email`> `date-ts` `date-offset` committer `name` <`email`> `date-ts` `date-offset` `commit-message` Notes: - () denotes optional entry. Indeed, first commit does not contain any parent commit. - empty line at the end of the commit message - timestamp example: 1444054085 - date offset for example: +0200 ### directory/tree sha1 git directory/tree computation: tree `tree-size`\0 \0... \0... Notes: - no newline separator between tree entries - no empty newline at the end of the tree entries - tree content header size is the length of the content - The tree entries are ordered according to bytes in their properties. Possible permissions are: - 100644 - file - 40000 - directory - 100755 - executable file - 120000 - symbolink link - 160000 - git link (relative to submodule) ### content/file sha1 git content computation: blob `blob-size`\0 `blob-content` Notes: - no newline at the end of the blob content Compress with DEFLATE and compute sha1