Changeset View
Changeset View
Standalone View
Standalone View
docs/tutorial.md
Show All 18 Lines | |||||
archived objects looking up their SWHIDs below the `archive/` entry-point. To | archived objects looking up their SWHIDs below the `archive/` entry-point. To | ||||
mount the Software Heritage archive, use the `swh fs mount` command: | mount the Software Heritage archive, use the `swh fs mount` command: | ||||
$ mkdir swhfs | $ mkdir swhfs | ||||
$ swh fs mount swhfs/ # mount the archive | $ swh fs mount swhfs/ # mount the archive | ||||
$ ls -1F swhfs/ # list entry points | $ ls -1F swhfs/ # list entry points | ||||
archive/ # <- start browsing from here | archive/ # <- start browsing from here | ||||
meta/ | cache/ | ||||
origin/ | |||||
README | |||||
By default SwhFS daemonizes into background and logs to syslog; it can be kept | By default SwhFS daemonizes into background and logs to syslog; it can be kept | ||||
in foreground, logging to the console, by passing `-f/--foreground` to `mount`. | in foreground, logging to the console, by passing `-f/--foreground` to `mount`. | ||||
To unmount use `swh fs umount PATH`. Note that, since SwhFS is a *user-space* | To unmount use `swh fs umount PATH`. Note that, since SwhFS is a *user-space* | ||||
filesystem, mounting and unmounting it are not privileged operations, any user | filesystem, mounting and unmounting it are not privileged operations, any user | ||||
can do it. | can do it. | ||||
Show All 40 Lines | Here is a SwhFS Hello World: | ||||
int main(void) { | int main(void) { | ||||
printf("Hello, World!\n"); | printf("Hello, World!\n"); | ||||
} | } | ||||
Given the SWHID of a source code file, we can directly access it via the | Given the SWHID of a source code file, we can directly access it via the | ||||
filesystem. | filesystem. | ||||
Metadata about archived source code artifacts is also locally available. For | Metadata about archived source code artifacts is also locally available. For | ||||
each entry under `archive/` there is a matching JSON file under `meta/`, | each entry `archive/<SWHID>` there is a matching JSON file | ||||
corresponding to what the [Software Heritage Web API][webapi] will return. For | `archive/<SWHID>.json`, corresponding to what the [Software Heritage Web | ||||
example, here is what the Software Heritage archive knows about the above Hello | API][webapi] will return. For example, here is what the Software Heritage | ||||
World implementation: | archive knows about the above Hello World implementation: | ||||
$ jq meta/swh:1:cnt:c839dea9e8e6f0528b468214348fee8669b305b2.json | $ cat archive/swh:1:cnt:c839dea9e8e6f0528b468214348fee8669b305b2.json | ||||
{ | { | ||||
"length": 67, | "length": 67, | ||||
"status": "visible", | "status": "visible", | ||||
"checksums": { | "checksums": { | ||||
"sha256": "06dfb5d936f50b3cb80152aa053724e4a18417c35f745b66ab9571c25afd0f79", | "sha256": "06dfb5d936f50b3cb80152aa053724e4a18417c35f745b66ab9571c25afd0f79", | ||||
"sha1": "459ee8545e5ba6cb819ba41e6ea2f0011cedd728", | "sha1": "459ee8545e5ba6cb819ba41e6ea2f0011cedd728", | ||||
"blake2s256": "87e6ab9c92681e9a022a8f4679dcd9d9b841fe4146edcbc15329fc66d8c82b4f", | "blake2s256": "87e6ab9c92681e9a022a8f4679dcd9d9b841fe4146edcbc15329fc66d8c82b4f", | ||||
"sha1_git": "c839dea9e8e6f0528b468214348fee8669b305b2" | "sha1_git": "c839dea9e8e6f0528b468214348fee8669b305b2" | ||||
}, | }, | ||||
"data_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/raw/", | "data_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/raw/", | ||||
"filetype_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/filetype/", | "filetype_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/filetype/", | ||||
"language_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/language/", | "language_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/language/", | ||||
"license_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/license/" | "license_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/license/" | ||||
} | } | ||||
Note: JSON metadata files are indented by default when read, this can be changed | |||||
zack: Uhm, can it? The only mention of indentation i've found in the doc thus far is how much the… | |||||
in the configuration file (see {ref}`documentation <swh-fuse-config>`). | |||||
[webapi]: https://archive.softwareheritage.org/api/ | [webapi]: https://archive.softwareheritage.org/api/ | ||||
## Source code trees | ## Source code trees | ||||
In addition to individual source code files, we can also browse entire source | In addition to individual source code files, we can also browse entire source | ||||
code directories. Here is the historical Apollo 11 source code, where we can | code directories. Here is the historical Apollo 11 source code, where we can | ||||
find interesting comments about the antenna during landing: | find interesting comments about the antenna during landing: | ||||
$ cd archive/swh:1:dir:1fee702c7e6d14395bbf5ac3598e73bcbf97b030 | $ cd archive/swh:1:dir:1fee702c7e6d14395bbf5ac3598e73bcbf97b030 | ||||
$ ls | wc -l | $ ls | wc -l | ||||
127 | 127 | ||||
$ grep -i antenna THE_LUNAR_LANDING.s | cut -f 5 | $ grep -i antenna THE_LUNAR_LANDING.s | cut -f 5 | ||||
# IS THE LR ANTENNA IN POSITION 1 YET | # IS THE LR ANTENNA IN POSITION 1 YET | ||||
# BRANCH IF ANTENNA ALREADY IN POSITION 1 | # BRANCH IF ANTENNA ALREADY IN POSITION 1 | ||||
We can checkout the commit of a more modern code base, like jQuery, and count | We can checkout the commit of a more modern code base, like jQuery, and count | ||||
its JavaScript lines of code (SLOC): | its JavaScript lines of code (SLOC): | ||||
$ cd archive/swh:1:rev:9d76c0b163675505d1a901e5fe5249a2c55609bc | $ cd archive/swh:1:rev:9d76c0b163675505d1a901e5fe5249a2c55609bc | ||||
$ ls -F | $ ls -1F | ||||
history/ | history/ | ||||
meta.json@ | meta.json@ | ||||
parent@ | parent@ | ||||
parents/ | parents/ | ||||
root@ | root@ | ||||
$ find root/src/ -type f -name '*.js' | xargs cat | wc -l | $ find root/src/ -type f -name '*.js' | xargs cat | wc -l | ||||
10136 | 10136 | ||||
Show All 28 Lines | commits sharded by commit identifier and timestamp: | ||||
$ ls history/by-hash/00/ | head -n 5 | $ ls history/by-hash/00/ | head -n 5 | ||||
swh:1:rev:00a9c2e5f4c855382435cec6b3908eb9bd5a53b7 | swh:1:rev:00a9c2e5f4c855382435cec6b3908eb9bd5a53b7 | ||||
swh:1:rev:005040379d8b64aacbe54941d878efa6e86df1cc | swh:1:rev:005040379d8b64aacbe54941d878efa6e86df1cc | ||||
swh:1:rev:00cc67af23bf9cf2cdbaeaeee6ded76baf0292f0 | swh:1:rev:00cc67af23bf9cf2cdbaeaeee6ded76baf0292f0 | ||||
swh:1:rev:00575d4d8c7421c5119f181009374ff2e7736127 | swh:1:rev:00575d4d8c7421c5119f181009374ff2e7736127 | ||||
swh:1:rev:0019a463bdcb81dc6ba3434505a45774ca27f363 | swh:1:rev:0019a463bdcb81dc6ba3434505a45774ca27f363 | ||||
$ ls -F history/by-date/ | $ ls -1F history/by-date/ | ||||
2006/ | 2006/ | ||||
2007/ | 2007/ | ||||
2008/ | 2008/ | ||||
... | ... | ||||
2018/ | 2018/ | ||||
2019/ | 2019/ | ||||
2020/ | 2020/ | ||||
$ ls -f history/by-date/2020/03/16/ | $ ls -f history/by-date/2020/03/16/ | ||||
swh:1:ref:90fed4b453a5becdb7f173d9e3c1492390a1441f | swh:1:ref:90fed4b453a5becdb7f173d9e3c1492390a1441f | ||||
$ jq .date history/by-date/2020/03/16/*/meta.json | $ jq .date history/by-date/2020/03/16/*/meta.json | ||||
"2020-03-16T21:49:29+01:00" | "2020-03-16T21:49:29+01:00" | ||||
Note that to populate the `by-date` view metadata about all commits in the | Note that to populate the `by-date` view, metadata about all commits in the | ||||
history are needed. To avoid blocking on that, metadata are retrieved | history are needed. To avoid blocking on that, metadata are retrieved | ||||
asynchronously, populating the view incrementally. The hidden `by-date/.status` | asynchronously, populating the view incrementally. The hidden `by-date/.status` | ||||
file provides a progress report and is removed upon completion. | file provides a progress report and is removed upon completion. | ||||
## Repository snapshots and branches | ## Repository snapshots and branches | ||||
Snapshot objects keep track of where each branch and release (or "tag") pointed | Snapshot objects keep track of where each branch and release (or "tag") pointed | ||||
Show All 14 Lines | which uses historical Unix releases as branch names: | ||||
$ jq .message,.date meta.json | $ jq .message,.date meta.json | ||||
"Bell 32V release\nSnapshot of the completed development branch\n\nSynthesized-from: 32v\n" | "Bell 32V release\nSnapshot of the completed development branch\n\nSynthesized-from: 32v\n" | ||||
"1979-05-02T23:26:55-05:00" | "1979-05-02T23:26:55-05:00" | ||||
$ grep core root/usr/src/games/fortune.c | $ grep core root/usr/src/games/fortune.c | ||||
printf("Memory fault -- core dumped\n"); | printf("Memory fault -- core dumped\n"); | ||||
We can check that two of the available branches correspond to historical Bell | We can check that two of the available branches correspond to historical Bell | ||||
Labs UNIX releases. And We can dig into the `fortune` implementation of | Labs UNIX releases. And we can dig into the `fortune` implementation of | ||||
[UNIX/32V](https://en.wikipedia.org/wiki/UNIX/32V) instantly, without having to | [UNIX/32V](https://en.wikipedia.org/wiki/UNIX/32V) instantly, without having to | ||||
clone a 1.6 GiB repository first. | clone a 1.6 GiB repository first. | ||||
## Origin search | ## Origin search | ||||
Origins can be accessed via the `origin/` top-level directory using their | Origins can be accessed via the `origin/` top-level directory using their | ||||
**encoded** URL (the percent-encoding mechanism described in [RFC | **encoded** URL (the percent-encoding mechanism described in [RFC | ||||
Show All 27 Lines |
Uhm, can it? The only mention of indentation i've found in the doc thus far is how much the indentation is, not the fact it is optional. We should fix the inconsistency.