Changeset View
Changeset View
Standalone View
Standalone View
docs/tutorial.md
- This file was added.
# Software Heritage virtual filesystem (SwhFS) --- Tutorial | |||||
## Installation | |||||
The Software Heritage virtual filesystem (SwhFS) is available from PyPI | |||||
as [swh.fuse](https://pypi.org/project/swh.fuse/). It can be installed from | |||||
there with `pip`: | |||||
$ pip install swh.fuse | |||||
## Mount | |||||
SwhFS is controlled by the `swh fuse` command-line interface (CLI). | |||||
To mount the [Software Heritage][swh] [archive][archive], use the `swh fuse | |||||
mount` sub-command: | |||||
$ mkdir swhfs | |||||
$ swh fuse mount swhfs/ | |||||
$ ls -1F swhfs/ | |||||
archive/ | |||||
meta/ | |||||
$ | |||||
[swh]: https://www.softwareheritage.org/ | |||||
[archive]: https://archive.softwareheritage.org/ | |||||
Once done, you can unmount SwhFS using `swh fuse umount PATH`. | |||||
## Lazy loading | |||||
Once mounted, the archive can be navigated as if it were locally available | |||||
on-disk. Archived objects are referenced | |||||
by [Software Heritage persistent identifiers][swhid] (SWHIDs). They are loaded | |||||
on-demand from the archive and populate lazily the `archive/` directory of the | |||||
SwhFS mount point. | |||||
[swhid]: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html | |||||
For instance, `swh:1:cnt:c839dea9e8e6f0528b468214348fee8669b305b2` is the SWHID | |||||
of a well-known tiny C program, hence: | |||||
$ cd swhfs/ | |||||
$ ls -l archive/ | |||||
total 0 | |||||
$ cat archive/swh:1:cnt:c839dea9e8e6f0528b468214348fee8669b305b2 | |||||
#include <stdio.h> | |||||
int main(void) { | |||||
printf("Hello, World!\n"); | |||||
} | |||||
$ ls -l archive/ | |||||
total 0 | |||||
-r--r--r-- 1 zack zack 67 Oct 18 09:26 swh:1:cnt:c839dea9e8e6f0528b468214348fee8669b305b2 | |||||
There is now a (virtual) regular file under `archive/`, whose content is a C | |||||
source file. The `meta/` directory under the SwhFS mount point contain metadata | |||||
about all retrieved objects, corresponding to what | |||||
the [Software Heritage Web API][webapi] will return: | |||||
$ cat meta/swh:1:cnt:c839dea9e8e6f0528b468214348fee8669b305b2.json | jq | |||||
{ | |||||
"length": 67, | |||||
"status": "visible", | |||||
"checksums": { | |||||
"sha256": "06dfb5d936f50b3cb80152aa053724e4a18417c35f745b66ab9571c25afd0f79", | |||||
"sha1": "459ee8545e5ba6cb819ba41e6ea2f0011cedd728", | |||||
"blake2s256": "87e6ab9c92681e9a022a8f4679dcd9d9b841fe4146edcbc15329fc66d8c82b4f", | |||||
"sha1_git": "c839dea9e8e6f0528b468214348fee8669b305b2" | |||||
}, | |||||
"data_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/raw/", | |||||
"filetype_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/filetype/", | |||||
"language_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/language/", | |||||
"license_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/license/" | |||||
} | |||||
[webapi]: https://archive.softwareheritage.org/api/ | |||||
## Source code trees | |||||
Let's dive into some source code: | |||||
$ cd archive/swh:1:dir:c6f07c2173a458d098de45d4c459a8f1916d900f/ | |||||
$ ls -1F | |||||
code/ | |||||
common/ | |||||
COPYING.txt | |||||
lcc/ | |||||
libs/ | |||||
q3asm/ | |||||
q3map/ | |||||
q3radiant/ | |||||
README.txt | |||||
ui/ | |||||
$ head -n 1 README.txt COPYING.txt | |||||
==> README.txt <== | |||||
Quake III Arena GPL source release | |||||
==> COPYING.txt <== | |||||
GNU GENERAL PUBLIC LICENSE | |||||
That's right, the directory SWHID in the above example references the | |||||
original [GPL source code release of Quake III Arena][quake], from 2005. We can | |||||
check how many lines of code it contained at the time: | |||||
$ sloccount . | |||||
[...] | |||||
Totals grouped by language (dominant language first): | |||||
ansic: 262772 (80.48%) | |||||
cpp: 48938 (14.99%) | |||||
objc: 6563 (2.01%) | |||||
perl: 6320 (1.94%) | |||||
asm: 1362 (0.42%) | |||||
sh: 375 (0.11%) | |||||
yacc: 185 (0.06%) | |||||
[...] | |||||
Total Physical Source Lines of Code (SLOC) = 326,515 | |||||
and search for, err, interesting patterns in the code: | |||||
$ rgrep -C 2 'what the f' | |||||
code/game/q_math.c- y = number; | |||||
code/game/q_math.c- i = * ( long * ) &y; // evil floating point bit level hacking | |||||
code/game/q_math.c: i = 0x5f3759df - ( i >> 1 ); // what the fuck? | |||||
code/game/q_math.c- y = * ( float * ) &i; | |||||
code/game/q_math.c- y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration | |||||
[quake]: https://en.wikipedia.org/wiki/Quake_III_Arena#Game_engine |