diff --git a/docs/index.rst b/docs/index.rst --- a/docs/index.rst +++ b/docs/index.rst @@ -10,4 +10,5 @@ cli configuration Design notes + Tutorial API reference diff --git a/docs/tutorial.md b/docs/tutorial.md new file mode 100644 --- /dev/null +++ b/docs/tutorial.md @@ -0,0 +1,137 @@ +# Software Heritage virtual filesystem (SwhFS) --- Tutorial + + +## Installation + +The Software Heritage virtual filesystem (SwhFS) is available from PyPI +as [swh.fuse](https://pypi.org/project/swh.fuse/). It can be installed from +there with `pip`: + + $ pip install swh.fuse + + +## Mount + +SwhFS is controlled by the `swh fuse` command-line interface (CLI). + +To mount the [Software Heritage][swh] [archive][archive], use the `swh fuse +mount` sub-command: + + $ mkdir swhfs + $ swh fuse mount swhfs/ + $ ls -1F swhfs/ + archive/ + meta/ + $ + +[swh]: https://www.softwareheritage.org/ +[archive]: https://archive.softwareheritage.org/ + +Once done, you can unmount SwhFS using `swh fuse umount PATH`. + + +## Lazy loading + +Once mounted, the archive can be navigated as if it were locally available +on-disk. Archived objects are referenced +by [Software Heritage persistent identifiers][swhid] (SWHIDs). They are loaded +on-demand from the archive and populate lazily the `archive/` directory of the +SwhFS mount point. + +[swhid]: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html + +For instance, `swh:1:cnt:c839dea9e8e6f0528b468214348fee8669b305b2` is the SWHID +of a well-known tiny C program, hence: + + $ cd swhfs/ + $ ls -l archive/ + total 0 + + $ cat archive/swh:1:cnt:c839dea9e8e6f0528b468214348fee8669b305b2 + #include + + int main(void) { + printf("Hello, World!\n"); + } + + $ ls -l archive/ + total 0 + -r--r--r-- 1 zack zack 67 Oct 18 09:26 swh:1:cnt:c839dea9e8e6f0528b468214348fee8669b305b2 + +There is now a (virtual) regular file under `archive/`, whose content is a C +source file. The `meta/` directory under the SwhFS mount point contain metadata +about all retrieved objects, corresponding to what +the [Software Heritage Web API][webapi] will return: + + $ cat meta/swh:1:cnt:c839dea9e8e6f0528b468214348fee8669b305b2.json | jq + { + "length": 67, + "status": "visible", + "checksums": { + "sha256": "06dfb5d936f50b3cb80152aa053724e4a18417c35f745b66ab9571c25afd0f79", + "sha1": "459ee8545e5ba6cb819ba41e6ea2f0011cedd728", + "blake2s256": "87e6ab9c92681e9a022a8f4679dcd9d9b841fe4146edcbc15329fc66d8c82b4f", + "sha1_git": "c839dea9e8e6f0528b468214348fee8669b305b2" + }, + "data_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/raw/", + "filetype_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/filetype/", + "language_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/language/", + "license_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/license/" + } + +[webapi]: https://archive.softwareheritage.org/api/ + + +## Source code trees + +Let's dive into some source code: + + $ cd archive/swh:1:dir:c6f07c2173a458d098de45d4c459a8f1916d900f/ + $ ls -1F + code/ + common/ + COPYING.txt + lcc/ + libs/ + q3asm/ + q3map/ + q3radiant/ + README.txt + ui/ + + $ head -n 1 README.txt COPYING.txt + ==> README.txt <== + Quake III Arena GPL source release + + ==> COPYING.txt <== + GNU GENERAL PUBLIC LICENSE + +That's right, the directory SWHID in the above example references the +original [GPL source code release of Quake III Arena][quake], from 2005. We can +check how many lines of code it contained at the time: + + $ sloccount . + [...] + + Totals grouped by language (dominant language first): + ansic: 262772 (80.48%) + cpp: 48938 (14.99%) + objc: 6563 (2.01%) + perl: 6320 (1.94%) + asm: 1362 (0.42%) + sh: 375 (0.11%) + yacc: 185 (0.06%) + + [...] + Total Physical Source Lines of Code (SLOC) = 326,515 + +and search for, err, interesting patterns in the code: + + $ rgrep -C 2 'what the f' + code/game/q_math.c- y = number; + code/game/q_math.c- i = * ( long * ) &y; // evil floating point bit level hacking + code/game/q_math.c: i = 0x5f3759df - ( i >> 1 ); // what the fuck? + code/game/q_math.c- y = * ( float * ) &i; + code/game/q_math.c- y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration + +[quake]: https://en.wikipedia.org/wiki/Quake_III_Arena#Game_engine