Page MenuHomeSoftware Heritage

D5784.id20681.diff
No OneTemporary

D5784.id20681.diff

diff --git a/docs/getting-started/getting_started_with_the_swh_api.rst b/docs/getting-started/getting_started_with_the_swh_api.rst
new file mode 100644
--- /dev/null
+++ b/docs/getting-started/getting_started_with_the_swh_api.rst
@@ -0,0 +1,672 @@
+Getting Started with the Software Heritage API
+==============================================
+
+Introduction
+------------
+
+About Software Heritage
+~~~~~~~~~~~~~~~~~~~~~~~
+
+The `Software Heritage project <https://www.softwareheritage.org>`__ was
+started in 2015 with a rather impressive goal and purpose:
+
+ Software Heritage is an ambitious initiative that aims at collecting,
+ organizing, preserving and sharing all the source code publicly
+ available in the world.
+
+Yes, you read it well: all source code available in the world. It implies to
+build an equally impressive structure to hold the huge amount of
+information represented, make the archive available to the public
+through a `nice web interface <https://archive.softwareheritage.org/>`__
+and even propose a `well-documented
+API <https://docs.softwareheritage.org/devel/swh-web/>`__ to access it
+seamlessly. For the records, there are also `various datasets
+available <https://docs.softwareheritage.org/devel/swh-dataset/graph/dataset.html>`__
+for download, with detailed instructions about how to set it up. And,
+yes it’s huge: the full graph generated from the archive (with only
+metadata, content is not included) has more than 20b nodes and weights
+1.2TB. Overall size of the archive is in the hundreds of TBs.
+
+This article presents, and demonstrates the use of, the `Software
+Heritage API <https://archive.softwareheritage.org/api/1/>`__ to query
+basic information about archived content and fetch the content of a
+software project.
+
+Terms and Concepts
+~~~~~~~~~~~~~~~~~~
+
+For our activity we need to define the following terms and concepts:
+
+- The repositories analysed by the SWH are registered as **origins**.
+ Examples of origins are: https://bitbucket.org/anthroweb/apache.git,
+ https://github.com/apache/ant, or other types of sources (debian
+ source packages, npmjs, pypi, cran..).
+- When repositories are analysed, it creates **snapshots**. Snapshots
+ describe the state of the repository at the time of analysis, and
+ provide links to the content. As an example in the case of a git
+ repository, the snapshot links to the list of branches, which
+ themselves link to revisions and content.
+- **Revisions** are consistent sets of directories and files
+ representing the repository at a given time, like in a baseline. They
+ can be conceptually mapped to commits in subversion, to git
+ references, or to source package versions in debian or nmpjs
+ repositories.
+- Revisions are linked to a **directory**, which itself links to other
+ directories and **files** (aka blobs).
+
+A full list of terms is provided in the `Software Heritage
+doc <https://wiki.softwareheritage.org/index.php?title=Glossary>`__.
+
+Preliminary steps
+-----------------
+
+System requirements
+~~~~~~~~~~~~~~~~~~~
+
+This article uses Python 3.x on the client side, and the ``requests``
+Python module to manipulate the HTTP requests. Note however that any
+language that provides HTTP requests (GET, POST) can access the API and
+could be used. Firstly let’s make sure we have the correct Python
+version and module installed:
+
+::
+
+ (gs_env) boris@castalia:gs$ python -V
+ Python 3.7.3
+ (gs_env) boris@castalia:notebooks$ pip install requests
+ Requirement already satisfied: requests in ./gs_env/lib/python3.7/site-packages (2.25.1)
+ Requirement already satisfied: certifi>=2017.4.17 in ./gs_env/lib/python3.7/site-packages (from requests) (2020.12.5)
+ Requirement already satisfied: chardet<5,>=3.0.2 in ./gs_env/lib/python3.7/site-packages (from requests) (4.0.0)
+ Requirement already satisfied: idna<3,>=2.5 in ./gs_env/lib/python3.7/site-packages (from requests) (2.10)
+ Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./gs_env/lib/python3.7/site-packages (from requests) (1.26.4)
+ (gs_env) boris@castalia:gs$
+
+Initialise the script
+---------------------
+
+We need to import a few modules and utilities to play with the Software
+Heritage API, namely ``json`` and the aforementioned ``requests``
+modules. We also define a utility function to pretty-print json data
+easily:
+
+.. code:: ipython3
+
+ import json
+ import requests
+
+ # Utility to pretty-print json.
+ def jprint(obj):
+ # create a formatted string of the Python JSON object
+ print(json.dumps(obj, sort_keys=True, indent=4))
+
+
+The syntax mentioned in the `API
+documentation <https://archive.softwareheritage.org/api/1/>`__ is rather
+straightforward. Since we want to read it from the main Software
+Heritage server, we will use ``https://archive.softwareheritage.org/``
+as the basename. All API calls will be forged according to the same
+syntax:
+
+::
+
+ https://archive.softwareheritage.org/api/1/<end/point>
+
+Request basic Information
+-------------------------
+
+We want to get some basic information about the main server activity and
+content. The ``stat`` endpoint provides asummary of the main indexes and
+some statistics about the archive. We can request a GET on the main
+counters of the archive using the counters path, as described in the
+`endpoint
+documentation <https://archive.softwareheritage.org/api/1/stat/counters/>`__:
+
+``/api/1/stat/counters/``
+
+This API endpoint returns the following information: \* **content** is
+the total number of blobs (files) in the archive. \* **directory** is
+the total number of repositories in the archive. \* **origin** is the
+number of distinct origins (repositories) fetched by the archive bots.
+\* **origin_visits** is the total number of visits across all origins.
+\* **person** is the number of authors (e.g. committers, authors) in the
+archived files. \* **release** is the number of tags retrieved in the
+archive. \* **revision** is the number of revisions stored in the
+archive. \* **skipped_content** is the number of objects which could be
+imported in the archive. \* **snapshot** is the number of snapshots
+stored in the archive.
+
+Note that we use the default JSON format for the output. We could use
+YAML if we wanted to, with a custom ``Request Headers`` set to
+``application/yaml``.
+
+.. code:: ipython3
+
+ resp = requests.get("https://archive.softwareheritage.org/api/1/stat/counters/")
+ counters = resp.json()
+ jprint(counters)
+
+
+.. parsed-literal::
+
+ {
+ "content": 10049535736,
+ "directory": 8390591308,
+ "origin": 156388918,
+ "person": 42263568,
+ "release": 17218891,
+ "revision": 2109783249
+ }
+
+
+There are almost 10bn blobs (aka files) in the archive and 8bn+
+directories already, for 155m repositories analysed.
+
+Now, what about a specific repository? Let’s say we want to find if
+`alambic <https://alambic.io>`__ (an open-source data provider and
+analysis system for software development) has already been analysed by
+the archive’s bots.
+
+Search the archive
+------------------
+
+Search for a keyword
+~~~~~~~~~~~~~~~~~~~~
+
+The easiest way to look for a keyword in the repositories analysed by
+the archive is to use the ``search`` feature of the ``origin`` endpoint.
+Documentation for the endpoint is
+`here <https://archive.softwareheritage.org/api/1/origin/search/doc/>`__
+and the complete syntax is:
+
+::
+
+ `/api/1/origin/search/<keyword>/`
+
+The server returns an array of hashes, with each item being formatted
+as:
+
+- **origin_visits_url** attribute is an URL that points to the API page
+ listing all visits (bot fetches) to this repository.
+- **url** is the url of the origin, or repository, itself.
+
+A (truncated) example of a result from this endpoint is shown below:
+
+::
+
+ [
+ {
+ "origin_visits_url": "https://archive.softwareheritage.org/api/1/origin/https://github.com/borisbaldassari/alambic/visits/",
+ "url": "https://github.com/borisbaldassari/alambic"
+ }
+ ...
+ ]
+
+As an example we will look for instances of *alambic* in the archive’s
+analysed repositories:
+
+.. code:: ipython3
+
+ resp = requests.get("https://archive.softwareheritage.org/api/1/origin/search/alambic/")
+ origins = resp.json()
+ print("We found",len(origins),"entries.")
+ for origin in origins[1:10]:
+ print('- ',origin['url'])
+
+
+.. parsed-literal::
+
+ We found 52 entries.
+ - https://github.com/royal-alambic-club/sauron
+ - https://github.com/scamberlin/alambic
+ - https://github.com/WebTales/alambic-connector-mongodb
+ - https://github.com/WebTales/alambic
+ - https://github.com/AssoAlambic/alambic-website
+ - https://bitbucket.org/nayoub/alambic.git
+ - https://github.com/Alexandru-Dobre/alambic-connector-rest
+ - https://github.com/WebTales/alambic-connector-diffbot
+ - https://github.com/WebTales/alambic-connector-firebase
+
+
+There are obviously many projects and repositories that embed the word
+alambic, and we will need to be a bit more specific if we are to
+identify the origin actually related to the alambic project.
+
+If we want to know more about a specific origin, we can simply use the
+``url`` attribute (or any known URL) as an entry for any of the
+``origin`` endpoints.
+
+Search for a specific origin
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Now say that we want to query the database for the specific repository
+of Alambic, to know what information has been registered by the archive.
+The API endpoint can be found `in the swh-web
+documentation <https://archive.softwareheritage.org/api/1/origin/doc/>`__,
+and has the following syntax:
+
+``/api/1/origin/<origin_url>/get/``
+
+Which returns the same type of JSON object than the ``search`` command
+seen previously:
+
+- **origin_visits_url** attribute is an URL that points to the API page
+ listing all visits (bot fetches) to this repository.
+- **url** is the url of the origin, or repository, itself.
+
+We know that Alambic is hosted at
+‘https://github.com/borisbaldassari/alambic/’, so the API call will look
+like this:
+
+``/api/1/origin/https://github.com/borisbaldassari/alambic/get/``
+
+.. code:: ipython3
+
+ resp = requests.get("https://archive.softwareheritage.org/api/1/origin/https://github.com/borisbaldassari/alambic/get/")
+ found = resp.json()
+ jprint(found)
+
+
+.. parsed-literal::
+
+ {
+ "origin_visits_url": "https://archive.softwareheritage.org/api/1/origin/https://github.com/borisbaldassari/alambic/visits/",
+ "url": "https://github.com/borisbaldassari/alambic"
+ }
+
+
+Get visits information
+~~~~~~~~~~~~~~~~~~~~~~
+
+We can use the ``origin_visits_url`` attribute to know more about when
+the repository was analysed by the archive bots. The API endpoint is
+fully documented on the `Software Heritage doc
+site <https://archive.softwareheritage.org/api/1/origin/visits/doc/>`__,
+and has the following syntax:
+
+``/api/1/origin/<origin_url>/visits/``
+
+We will use the same query as before about the main Alambic repository.
+
+.. code:: ipython3
+
+ resp = requests.get("https://archive.softwareheritage.org/api/1/origin/https://github.com/borisbaldassari/alambic/visits/")
+ found = resp.json()
+ length = len(found)
+ print("Number of visits found: {}.".format(length))
+ print("With dates:")
+ for visit in found:
+ print("-",visit['visit'],visit['date'])
+ print("\nExample of a single visit entry:")
+ jprint(found[0])
+
+
+.. parsed-literal::
+
+ Number of visits found: 5.
+ With dates:
+ - 5 2021-01-01T19:35:41.308336+00:00
+ - 4 2020-02-06T10:41:45.700641+00:00
+ - 3 2019-09-01T22:38:12.056537+00:00
+ - 2 2019-06-16T04:52:18.162914+00:00
+ - 1 2019-01-30T07:19:20.799217+00:00
+
+ Example of a single visit entry:
+ {
+ "date": "2021-01-01T19:35:41.308336+00:00",
+ "metadata": {},
+ "origin": "https://github.com/borisbaldassari/alambic",
+ "origin_visit_url": "https://archive.softwareheritage.org/api/1/origin/https://github.com/borisbaldassari/alambic/visit/5/",
+ "snapshot": "6436d2c9b06cf9bd9efb0b4e463c3fe6b868eadc",
+ "snapshot_url": "https://archive.softwareheritage.org/api/1/snapshot/6436d2c9b06cf9bd9efb0b4e463c3fe6b868eadc/",
+ "status": "full",
+ "type": "git",
+ "visit": 5
+ }
+
+
+Get the content
+---------------
+
+As defined in the beginning, a snapshot is a capture of the repository
+at a given time with links to all branches, commits and associated
+content. In this example we will work on the snapshot ID of the last
+visit to Alambic, as returned by the previous command we executed.
+
+.. code:: ipython3
+
+ # Store snapshot id
+ snapshot = found[0]['snapshot']
+ print("Snapshot is {}.".format(snapshot))
+
+
+.. parsed-literal::
+
+ Snapshot is 6436d2c9b06cf9bd9efb0b4e463c3fe6b868eadc.
+
+
+Note that the latest visit to the repository can also be directly
+retrieved using the `dedicated
+endpoint <https://archive.softwareheritage.org/api/1/origin/visit/latest/doc/>`__
+``/api/1/origin/visit/latest/``.
+
+Get the snapshot
+~~~~~~~~~~~~~~~~
+
+We want now to retrieve the content of the project at this snapshot. For
+that purpose there is the ``snapshot`` endpoint, and its documentation
+is `provided
+here <https://archive.softwareheritage.org/api/1/snapshot/doc/>`__. The
+complete syntax is:
+
+``/api/1/snapshot/<snapshot_id>/``
+
+The snapshot endpoint returns in the ``branches`` attribute a list of
+**revisions** (aka commits or branch refs in a git context), which
+themselves point to the set of directories and files in the branch at
+the time of analysis. Let’s follow this chain of links, starting with
+the snapshot’s list of revisions (branches):
+
+.. code:: ipython3
+
+ snapshotr = requests.get("https://archive.softwareheritage.org/api/1/snapshot/{}/".format(snapshot))
+ snapshotj = snapshotr.json()
+ jprint(snapshotj)
+
+
+.. parsed-literal::
+
+ {
+ "branches": {
+ "HEAD": {
+ "target": "refs/heads/master",
+ "target_type": "alias",
+ "target_url": "https://archive.softwareheritage.org/api/1/revision/6dd0504b43b4459d52e9f13f71a91cc0fc445a19/"
+ },
+ "refs/heads/devel": {
+ "target": "e298b8c5692b18928013a68e41fd185419515075",
+ "target_type": "revision",
+ "target_url": "https://archive.softwareheritage.org/api/1/revision/e298b8c5692b18928013a68e41fd185419515075/"
+ },
+ "refs/heads/features/cr152_anonymise_data": {
+ "target": "ba3e0dcbfa0cb212a7186e9e62efb6dafe7fe162",
+ "target_type": "revision",
+ "target_url": "https://archive.softwareheritage.org/api/1/revision/ba3e0dcbfa0cb212a7186e9e62efb6dafe7fe162/"
+ },
+ "refs/heads/features/cr164_github_project": {
+ "target": "0005abb080e4c67a97533ee923e9d28142877752",
+ "target_type": "revision",
+ "target_url": "https://archive.softwareheritage.org/api/1/revision/0005abb080e4c67a97533ee923e9d28142877752/"
+ },
+ "refs/heads/features/cr165_github_its": {
+ "target": "0005abb080e4c67a97533ee923e9d28142877752",
+ "target_type": "revision",
+ "target_url": "https://archive.softwareheritage.org/api/1/revision/0005abb080e4c67a97533ee923e9d28142877752/"
+ },
+ "refs/heads/features/cr89_gitlabwizard": {
+ "target": "b941fd5f93a6cfc2349358b891e47d0fffe0ed2d",
+ "target_type": "revision",
+ "target_url": "https://archive.softwareheritage.org/api/1/revision/b941fd5f93a6cfc2349358b891e47d0fffe0ed2d/"
+ },
+ "refs/heads/master": {
+ "target": "6dd0504b43b4459d52e9f13f71a91cc0fc445a19",
+ "target_type": "revision",
+ "target_url": "https://archive.softwareheritage.org/api/1/revision/6dd0504b43b4459d52e9f13f71a91cc0fc445a19/"
+ }
+ },
+ "id": "6436d2c9b06cf9bd9efb0b4e463c3fe6b868eadc",
+ "next_branch": null
+ }
+
+
+Get the root directory
+~~~~~~~~~~~~~~~~~~~~~~
+
+The revision associated to the branch can be retrieved by following the
+corresponding link in the ``target_url`` attribute. We will follow the
+``refs/heads/master`` branch and get the associated revision object. In
+this case (a git repository) the revision is equivalent to a branch ref
+or commit, with an ID and message.
+
+.. code:: ipython3
+
+ print('Revision ID is',snapshotj['id'])
+ master_url = snapshotj['branches']['refs/heads/master']['target_url']
+ masterr = requests.get(master_url)
+ masterj = masterr.json()
+ jprint(masterj)
+
+
+.. parsed-literal::
+
+ Revision ID is 6436d2c9b06cf9bd9efb0b4e463c3fe6b868eadc
+ {
+ "author": {
+ "email": "boris.baldassari@gmail.com",
+ "fullname": "Boris Baldassari <boris.baldassari@gmail.com>",
+ "name": "Boris Baldassari"
+ },
+ "committer": {
+ "email": "boris.baldassari@gmail.com",
+ "fullname": "Boris Baldassari <boris.baldassari@gmail.com>",
+ "name": "Boris Baldassari"
+ },
+ "committer_date": "2020-11-01T12:55:13+01:00",
+ "date": "2020-11-01T12:55:13+01:00",
+ "directory": "fd9fe3477db3b9b7dea63509832b3fa99bdd7eb8",
+ "directory_url": "https://archive.softwareheritage.org/api/1/directory/fd9fe3477db3b9b7dea63509832b3fa99bdd7eb8/",
+ "extra_headers": [],
+ "history_url": "https://archive.softwareheritage.org/api/1/revision/6dd0504b43b4459d52e9f13f71a91cc0fc445a19/log/",
+ "id": "6dd0504b43b4459d52e9f13f71a91cc0fc445a19",
+ "merge": false,
+ "message": "#163 Fix dygraphs zero padding in forums plugin.\n",
+ "metadata": {},
+ "parents": [
+ {
+ "id": "a4a2d8925c1cc43612602ac28e4ca9a31728b151",
+ "url": "https://archive.softwareheritage.org/api/1/revision/a4a2d8925c1cc43612602ac28e4ca9a31728b151/"
+ }
+ ],
+ "synthetic": false,
+ "type": "git",
+ "url": "https://archive.softwareheritage.org/api/1/revision/6dd0504b43b4459d52e9f13f71a91cc0fc445a19/"
+ }
+
+
+The revision is associated to the root directory of the project. We can
+list all files and directories at the root by requesting more
+information from the ``directory_url`` attribute. The endpoint is
+documented
+`here <https://archive.softwareheritage.org/api/1/directory/doc/>`__ and
+has the following syntax:
+
+``/api/1/directory/<directory_id>/``
+
+The structure of the response is an **array of files and directories**.
+**Files** are represented like this:
+
+::
+
+ {
+ "checksums": {
+ "sha1": "5973b582bfaeffa71c924e3fe7150620230391d8",
+ "sha1_git": "a6c4d5ebfdf88b3b1a65996f6c438c01bf60740b",
+ "sha256": "8761f1e1fd96fc4c86ad343a7c19ecd51c0bde4d7055b3315c3975b31ec61bbc"
+ },
+ "dir_id": "3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200",
+ "length": 101,
+ "name": ".dockerignore",
+ "perms": 33188,
+ "status": "visible",
+ "target": "a6c4d5ebfdf88b3b1a65996f6c438c01bf60740b",
+ "target_url": "https://archive.softwareheritage.org/api/1/content/sha1_git:a6c4d5ebfdf88b3b1a65996f6c438c01bf60740b/",
+ "type": "file"
+ }
+
+And **directories** are represented with:
+
+::
+
+ {
+ "dir_id": "3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200",
+ "length": null,
+ "name": "doc",
+ "perms": 16384,
+ "target": "316468df4988351911992ecbf1866f1c1f575c23",
+ "target_url": "https://archive.softwareheritage.org/api/1/directory/316468df4988351911992ecbf1866f1c1f575c23/",
+ "type": "dir"
+ }
+
+We will print the list of files and directories located at the root of
+the repository at the time of analysis:
+
+.. code:: ipython3
+
+ root_url = masterj['directory_url']
+ rootr = requests.get(root_url)
+ rootj = rootr.json()
+ for f in rootj:
+ print('-',f['name'])
+ #jprint(rootj)
+
+
+.. parsed-literal::
+
+ - .dockerignore
+ - .env
+ - .gitignore
+ - CODE_OF_CONDUCT.html
+ - CODE_OF_CONDUCT.md
+ - LICENCE.html
+ - LICENCE.md
+ - Readme.md
+ - doc
+ - docker
+ - docker-compose.run.yml
+ - docker-compose.test.yml
+ - dockercfg.encrypted
+ - mojo
+ - resources
+
+
+We could follow the links up (or down) to the leaves in order to rebuild
+the project structure and download all files individually to rebuild the
+project locally. However the archive can do it for us, and provides a
+feature to download the content of a whole project in one step:
+**cooking**. The feature is described in the `swh-vault
+documentation <https://docs.softwareheritage.org/devel/swh-vault/api.html#cooking-and-status-checking>`__.
+
+Download content of a project
+-----------------------------
+
+When we ask the Archive to cook a directory for us, it invokes an
+asynchronous job to recuversively fetch the directories and files of the
+project, following the graph up to the leaves (files) and exporting the
+result as a tar.gz file. This procedure is handled by the `swh-vault
+component <https://docs.softwareheritage.org/devel/swh-vault/getting-started.html>`__,
+and it’s all automatic.
+
+Order the meal
+~~~~~~~~~~~~~~
+
+A cooking job can be invoked for revisions, directories or snapshots
+(soon). It is initiated with a POST request on the ``vault/<type>/``
+endpoint, and its complete syntax is:
+
+``/api/1/vault/directory/<directory_id>/``
+
+The first POST request initiates the cooking, and subsequent GET
+requests can fetch the job result and download the archive. See the
+`Software Heritage
+documentation <https://docs.softwareheritage.org/devel/swh-vault/getting-started.html#example-retrieving-a-directory>`__
+on this, with useful examples. The API endpoint is documented
+`here <https://archive.softwareheritage.org/api/1/vault/directory/doc/>`__.
+
+In this example we will fetch the content of the root directory that we
+previously identified.
+
+.. code:: ipython3
+
+ mealr = requests.post("https://archive.softwareheritage.org/api/1/vault/directory/3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/")
+ mealj = mealr.json()
+ jprint(mealj)
+
+
+.. parsed-literal::
+
+ {
+ "fetch_url": "https://archive.softwareheritage.org/api/1/vault/directory/3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/raw/",
+ "id": 379321799,
+ "obj_id": "3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200",
+ "obj_type": "directory",
+ "progress_message": null,
+ "status": "done"
+ }
+
+
+Ask if it’s ready
+~~~~~~~~~~~~~~~~~
+
+We can use a GET request on the same URL to get information about the
+process status:
+
+.. code:: ipython3
+
+ statusr = requests.get("https://archive.softwareheritage.org/api/1/vault/directory/3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/")
+ statusj = statusr.json()
+ jprint(statusj)
+
+
+.. parsed-literal::
+
+ {
+ "fetch_url": "https://archive.softwareheritage.org/api/1/vault/directory/3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/raw/",
+ "id": 379321799,
+ "obj_id": "3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200",
+ "obj_type": "directory",
+ "progress_message": null,
+ "status": "done"
+ }
+
+
+Get the plate
+~~~~~~~~~~~~~
+
+Once the processing is finished (it can take up to a few minutes) the
+tar.gz archive can be downloaded through the ``fetch_url`` link, and
+extracted as a tar.gz archive:
+
+::
+
+ boris@castalia:downloads$ curl https://archive.softwareheritage.org/api/1/vault/directory/3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/raw/ -o myarchive.tar.gz
+ % Total % Received % Xferd Average Speed Time Time Time Current
+ Dload Upload Total Spent Left Speed
+ 100 9555k 100 9555k 0 0 1459k 0 0:00:06 0:00:06 --:--:-- 1717k
+ boris@castalia:downloads$ ls
+ myarchive.tar.gz
+ boris@castalia:downloads$ tar xzf myarchive.tar.gz
+ 3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/
+ 3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/.dockerignore
+ 3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/.env
+ 3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/.gitignore
+ 3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/CODE_OF_CONDUCT.html
+ 3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/CODE_OF_CONDUCT.md
+ 3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/LICENCE.html
+ 3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/LICENCE.md
+ 3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/Readme.md
+ 3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/doc/
+ 3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/doc/Readme.md
+ 3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/doc/config
+ [SNIP]
+
+Conclusion
+----------
+
+In this article, we learned **how to explore and use the Software
+Heritage archive using its API**: searching for a repository,
+identifying projects and downloading specific snapshots of a repository.
+There is a lot more to the Archive and its API than what we have seen,
+and all features are generously documented on the `Software Heritage web
+site <https://archive.softwareheritage.org/api/>`__.
+
+
+
diff --git a/docs/getting-started/index.rst b/docs/getting-started/index.rst
--- a/docs/getting-started/index.rst
+++ b/docs/getting-started/index.rst
@@ -11,3 +11,5 @@
../getting-started
../developer-setup
using-docker
+ getting_started_with_the_swh_api
+

File Metadata

Mime Type
text/plain
Expires
Nov 5 2024, 2:57 PM (12 w, 4 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3231312

Event Timeline