diff --git a/docs/configuration.rst b/docs/configuration.rst --- a/docs/configuration.rst +++ b/docs/configuration.rst @@ -14,36 +14,44 @@ well as explicitly overridden on the :ref:`command line ` via the ``-C/--config-file`` flag. -The following sub-sections and fields can be used within the `swh > fuse` +The following sub-sections and fields can be used within the ``swh > fuse`` stanza: - ``cache``: - ``metadata``: where to store the metadata cache, must have either a - ``in-memory`` boolean entry or a ``path`` string entry (with the - corresponding disk path) + ``in-memory`` boolean entry set to true or a ``path`` string entry (with the + corresponding disk path). - ``blob``: where to store the blob cache, same entries as the ``metadata`` - cache + cache. + - ``direntry``: how much memory should be used by the direntry cache, + specified using a ``maxram`` entry (either as a percentage of available RAM, + or with disk storage unit suffixes: ``B``, ``KB``, ``MB``, ``GB``). - ``web-api``: - ``url``: archive API URL - ``auth-token``: authentication token used with the API URL +- ``json-indent``: number of spaces used to print JSON metadata files (setting + it to ``null`` disables indentation). + If no configuration is given, default values are: - ``cache``: all cache files are stored in ``$XDG_CACHE_HOME/swh/fuse/`` (or - ``~/.cache/swh/fuse`` if ``XDG_CACHE_HOME`` is not set) -- ``web-api``: default URL is , - with no authentication token + ``~/.cache/swh/fuse`` if ``XDG_CACHE_HOME`` is not set). The direntry cache + will use at most 10% of available RAM. +- ``web-api``: URL is https://archive.softwareheritage.org/api/1/, with no + authentication token +- ``json-indent``: 2 spaces. Example ------- Here is a full ``~/.config/swh/global.yml`` example, showcasing different cache -storage strategies (in-memory for metadata and on-disk for blob), using the -default Web API service: +storage strategies (in-memory for metadata, on-disk for blob, 20% RAM for +direntry), using the default Web API service: .. code:: yaml @@ -54,9 +62,11 @@ in-memory: true blob: path: "/path/to/cache/blob.sqlite" + direntry: + maxram: 20% web-api: url: "https://archive.softwareheritage.org/api/1/" - auth-token: null + auth-token: eyJhbGciOiJIUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJhMTMxYTQ1My1hM2IyLTQwMTUtO... Logging diff --git a/docs/design.md b/docs/design.md --- a/docs/design.md +++ b/docs/design.md @@ -63,11 +63,13 @@ - `cache/`: on-disk representation of locally cached objects and metadata. Via this directory you can browse cached data and selectively remove them from the - cache, freeing disk space. (See `swh fs clean` in the {ref}`CLI - ` to completely empty the cache). The directory is populated - with symlinks to: all artifacts, identified by their SWHIDs and sharded by the - first two character of their object id, the metadata identified by a - `SWHID.json` entry, and the `origin/` directory. + cache, freeing disk space. (See `swh fs clean` in the {ref}`CLI ` + to completely empty the cache). The directory is populated with symlinks to: + all artifacts, identified by their SWHIDs and sharded by the first two + character of their object id, the metadata identified by a `SWHID.json` entry, + and the `origin/` directory. + +- `README`: file explaining briefly what is SwhFS. ## File system representation @@ -176,7 +178,8 @@ We assume that no cache *invalidation* is necessary, due to intrinsic properties of the Software Heritage archive, such as integrity verification and append-only archive changes. To clean the caches one can just remove the corresponding files -from disk. +from disk, or using a more fine-grained strategy, navigate the `cache/` +top-level directory and `rm ` to purge specific artifacts. ### Metadata cache diff --git a/docs/tutorial.md b/docs/tutorial.md --- a/docs/tutorial.md +++ b/docs/tutorial.md @@ -24,7 +24,9 @@ $ ls -1F swhfs/ # list entry points archive/ # <- start browsing from here - meta/ + cache/ + origin/ + README By default SwhFS daemonizes into background and logs to syslog; it can be kept in foreground, logging to the console, by passing `-f/--foreground` to `mount`. @@ -81,12 +83,12 @@ filesystem. Metadata about archived source code artifacts is also locally available. For -each entry under `archive/` there is a matching JSON file under `meta/`, -corresponding to what the [Software Heritage Web API][webapi] will return. For -example, here is what the Software Heritage archive knows about the above Hello -World implementation: +each entry `archive/` there is a matching JSON file +`archive/.json`, corresponding to what the [Software Heritage Web +API][webapi] will return. For example, here is what the Software Heritage +archive knows about the above Hello World implementation: - $ jq meta/swh:1:cnt:c839dea9e8e6f0528b468214348fee8669b305b2.json + $ cat archive/swh:1:cnt:c839dea9e8e6f0528b468214348fee8669b305b2.json { "length": 67, "status": "visible", @@ -102,6 +104,9 @@ "license_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:c839dea9e8e6f0528b468214348fee8669b305b2/license/" } +Note: JSON metadata files are indented by default when read, this can be changed +in the configuration file (see {ref}`documentation `). + [webapi]: https://archive.softwareheritage.org/api/ @@ -126,7 +131,7 @@ $ cd archive/swh:1:rev:9d76c0b163675505d1a901e5fe5249a2c55609bc - $ ls -F + $ ls -1F history/ meta.json@ parent@ @@ -171,7 +176,7 @@ swh:1:rev:00575d4d8c7421c5119f181009374ff2e7736127 swh:1:rev:0019a463bdcb81dc6ba3434505a45774ca27f363 - $ ls -F history/by-date/ + $ ls -1F history/by-date/ 2006/ 2007/ 2008/ @@ -186,7 +191,7 @@ $ jq .date history/by-date/2020/03/16/*/meta.json "2020-03-16T21:49:29+01:00" -Note that to populate the `by-date` view metadata about all commits in the +Note that to populate the `by-date` view, metadata about all commits in the history are needed. To avoid blocking on that, metadata are retrieved asynchronously, populating the view incrementally. The hidden `by-date/.status` file provides a progress report and is removed upon completion. @@ -217,7 +222,7 @@ printf("Memory fault -- core dumped\n"); We can check that two of the available branches correspond to historical Bell -Labs UNIX releases. And We can dig into the `fortune` implementation of +Labs UNIX releases. And we can dig into the `fortune` implementation of [UNIX/32V](https://en.wikipedia.org/wiki/UNIX/32V) instantly, without having to clone a 1.6  GiB repository first.