diff --git a/docs/api.rst b/docs/api.rst index 535920b..7d3d5b3 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -1,172 +1,177 @@ -Software Heritage Vault API Reference -===================================== - -Software source code **objects**---e.g., individual source code files, -tarballs, commits, tagged releases, etc.---are stored in the Software -Heritage (SWH) Archive in fully deduplicated form. That allows direct -access to individual artifacts but require some preparation, usually in -the form of collecting and assembling multiple artifacts in a single -**bundle**, when fast access to a set of related artifacts (e.g., the -snapshot of a VCS repository, the archive corresponding to a Git commit, -or a specific software release as a zip archive) is required. - -The **Software Heritage Vault** is a cache of pre-built source code -bundles which are assembled opportunistically retrieving objects from -the Software Heritage Archive, can be accessed efficiently, and might be -garbage collected after a long period of non-use. - -API ---- - -All URLs below are meant to be mounted at API root, which is currently -at https://archive.softwareheritage.org/api/1/. Unless otherwise stated, -all API endpoints respond on HTTP GET method. +.. _vault-api-ref: + +Vault API Reference +=================== + +Software source code **objects**---e.g., individual files, directories, +commits, tagged releases, etc.---are stored in the Software Heritage (SWH) +Archive in fully deduplicated form. That allows direct access to individual +artifacts, but require some preparation ("cooking") when fast access to a large +set of related objects (e.g., an entire repository) is required. + +The **Software Heritage Vault** takes care of that preparation by +asynchronously assembling **bundles** of related source code objects, caching, +and garbage collecting them as needed. + +The Vault is accessible via a RESTful API documented below. + +All endpoints are mounted at API root, which is currently at +https://archive.softwareheritage.org/api/1/. + +Unless otherwise stated, API endpoints respond to HTTP GET method. + Object identification --------------------- -The vault stores bundles corresponding to different kinds of objects. -The following object kinds are supported: +The vault stores bundles corresponding to different kinds of objects (see +:ref:`data-model`). The following object kinds are currently supported by the +Vault: - directories - revisions -- repository snapshots (not available yet) +- snapshots -The URL fragment ``:objectkind/:objectid`` is used throughout the vault -API to identify vault objects. The syntax and meaning of ``:objectid`` for -the different object kinds is detailed below. +The URL fragment ``:objectkind/:objectid`` is used throughout the vault API to +identify vault objects. The syntax and meaning of ``:objectid`` for the +different object kinds is detailed below. + +Optionally, a third parameter, ``:format``, can be used (when multiple formats +are supported) to specify the format of the resulting bundle when needed. The +URL fragment becomes then ``:objectkind/:objectid/:format``. -Optionally, a third parameter, ``:format``, can sometimes be used to -specify the format of the resulting bundle when needed. The URL fragment -becomes ``:objectkind/:objectid/:format``. Directories ~~~~~~~~~~~ - object kind: ``directory`` -- URL fragment: ``directory/:sha1git`` +- URL fragment: ``directory/:dir_id`` -where ``:sha1git`` is the directory ID in the SWH data model. +where ``:dir_id`` is a :py:func:`directory identifier +`. -Currently, the only format available for a directory export is a -gzip-compressed tarball. You can extract the resulting bundle using: +The only format available for a directory export is a gzip-compressed +tarball. You can extract the resulting bundle using: .. code:: shell - tar xvf bundle.tar.gz + tar xaf bundle.tar.gz + Revisions ~~~~~~~~~ - object kind: ``revision`` -- URL fragment: ``revision/:sha1git/:format`` +- URL fragment: ``revision/:rev_id/:format`` -where ``:sha1git`` is the revision ID in the SWH data model, and -``:format`` is the export format. +where ``:rev_id`` is a :py:func:`revision identifier +` and ``:format`` is the export +format. -Currently, the only format available for a revision export is -``gitfast``: a gzip-compressed git fast-export, according to the format -documented in ``git-fast-import(1)``. You can extract the resulting +The only format available for a revision export is ``gitfast``: a +gzip-compressed `git fast-export +`_. You can extract the resulting bundle using: .. code:: shell git init zcat bundle.gitfast.gz | git fast-import git checkout HEAD + Repository snapshots ~~~~~~~~~~~~~~~~~~~~ -**[NOT YET AVAILABLE]** +.. TODO +**(NOT AVAILABLE YET)** - object kind: ``snapshot`` -- URL fragment: ``snapshot/:sha1git`` +- URL fragment: ``snapshot/:snp_id`` + +where ``:snp_id`` is a :py:func:`snapshot identifier +`. -where ``:sha1git`` is the snapshot ID in the SWH data model. (**TODO** -repository snapshots don't exist yet as first-class citizens in the SWH -data model; see References below.) Cooking and status checking --------------------------- -Bundles in the vault might be ready for retrieval or not. When they are -not, they will need to be **cooked** before they can be retrieved. A -cooked bundle will remain around until it expires; at that point it will -need to be cooked again before it can be retrieved. Cooking is -idempotent, and a no-op in between a previous cooking operation and -expiration. +Vault bundles might be ready for retrieval or not. When they are not, they will +need to be **cooked** before they can be retrieved. A cooked bundle will remain +around until it expires; after expiration, it will need to be cooked again +before it can be retrieved. Cooking is idempotent, and a no-op in between a +previous cooking operation and expiration. .. http:post:: /vault/:objectkind/:objectid/:format .. http:get:: /vault/:objectkind/:objectid/:format **Request body**: optionally, an ``email`` POST parameter containing an e-mail to notify when the bundle cooking has ended. **Allowed HTTP Methods:** - :http:method:`post` to **request** a bundle cooking - :http:method:`get` to check the progress and status of the cooking - :http:method:`head` - :http:method:`options` **Response:** :statuscode 200: bundle available for cooking, status of the cooking :statuscode 400: malformed identifier hash or format :statuscode 404: unavailable bundle or object not found .. sourcecode:: http HTTP/1.1 200 OK Content-Type: application/json { "id": 42, - "fetch_url": "/api/1/vault/directory//raw/", - "obj_id": "", + "fetch_url": "/api/1/vault/directory/:dir_id/raw/", + "obj_id": ":dir_id", "obj_type": "directory", "progress_message": "Creating tarball...", "status": "pending" } After a cooking request has been started, all subsequent GET and POST requests to the cooking URL return some JSON data containing information about the progress of the bundle creation. The JSON contains the following keys: - - ``id``: the ID of the cooking request - - - ``fetch_url``: the URL that can be used for the retrieval of the - bundle + - ``id``: the ID of the cooking request - - ``obj_type``: an internal identifier uniquely representing the object - kind and the format of the required bundle. + - ``fetch_url``: the URL that can be used for the retrieval of the bundle - - ``obj_id``: the identifier of the requested bundle + - ``obj_type``: an internal identifier uniquely representing the object + kind and the format of the required bundle. - - ``status``: one of the following values: + - ``obj_id``: the identifier of the requested bundle - - ``new``: the bundle request was created - - ``pending``: the bundle is being cooked - - ``done``: the bundle has been cooked and is ready for retrieval - - ``failed``: the bundle cooking failed and can be retried + - ``status``: one of the following values: - - ``progress_message``: a string describing the current progress of the - cooking. If the cooking failed, ``progress_message`` will contain the - reason of the failure. + - ``new``: the bundle request was created + - ``pending``: the bundle is being cooked + - ``done``: the bundle has been cooked and is ready for retrieval + - ``failed``: the bundle cooking failed and can be retried + - ``progress_message``: a string describing the current progress of the + cooking. If the cooking failed, ``progress_message`` will contain the + reason of the failure. Retrieval --------- Retrieve a specific bundle from the vault with: .. http:get:: /vault/:objectkind/:objectid/:format/raw + Where ``:format`` is optional, depending on the object kind. + **Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`, :http:method:`options` **Response**: :statuscode 200: bundle available; response body is the bundle. :statuscode 404: unavailable bundle; client should request its cooking. diff --git a/docs/getting-started.rst b/docs/getting-started.rst index ecebc95..9239d96 100644 --- a/docs/getting-started.rst +++ b/docs/getting-started.rst @@ -1,59 +1,62 @@ +.. _vault-primer: + Getting started =============== The Vault is a service in charge of reconstructing parts of the archive as self-contained bundles, that can then be imported locally, for instance in a Git repository. This is basically where you can do a ``git clone`` of a repository stored in Software Heritage. The Vault is asynchronous : you first need to do a request to prepare the bundle you need, and then a second request to fetch the bundle once the Vault has finished to reconstitute the bundle. -Example: getting a directory ----------------------------- +Example: retrieving a directory +------------------------------- First, ask the Vault to prepare your bundle: .. code:: shell - curl -X POST https://archive.softwareheritage.org/1/vault/directory// + curl -X POST https://archive.softwareheritage.org/1/vault/directory/:dir_id/ -This request and all subsequent requests to this endpoint will return -some JSON data containing information about the progress of the bundle -creation: +where ``:dir_id`` is a :py:func:`directory identifier +`. This initial request and all +subsequent requests to this endpoint will return some JSON data containing +information about the progress of bundle creation: .. code:: json { "id": 42, - "fetch_url": "/api/1/vault/directory//raw/", - "obj_id": "", + "fetch_url": "/api/1/vault/directory/:dir_id/raw/", + "obj_id": ":dir_id", "obj_type": "directory", "progress_message": "Creating tarball...", "status": "pending" } Once the status is ``done``, you can fetch the bundle at the address given in the ``fetch_url`` field. .. code:: shell - curl https://archive.softwareheritage.org/1/vault/directory//raw + curl -o bundle.tar.gz https://archive.softwareheritage.org/1/vault/directory/:dir_id/raw + tar xaf bundle.tar.gz E-mail notifications -------------------- -You can also ask to be notified by e-mail once the bundle you requested -is ready, by simply giving an ``email`` POST parameter: +You can also ask to be notified by e-mail once the bundle you requested is +ready, by giving an ``email`` POST parameter: .. code:: shell curl -X POST -d 'email=example@example.com' \ - https://archive.softwareheritage.org/1/vault/directory// + https://archive.softwareheritage.org/1/vault/directory/:dir_id/ API reference ~~~~~~~~~~~~~ -For a more exhaustive overview of the Vault API, you can take a look at -the `Vault API Reference <./api.html>`__ +For a more exhaustive overview of the Vault API, see the :ref:`vault-api-ref`.