diff --git a/docs/architecture/overview.rst b/docs/architecture/overview.rst --- a/docs/architecture/overview.rst +++ b/docs/architecture/overview.rst @@ -129,15 +129,14 @@ First of all, the archive website and API, also known as :ref:`swh-web `, is the main entry point of the archive. -This is the component that serves https://archive.softwareheritage.org/, -which is the window into the entire archive, as it provides access to it -through a web browser or the HTTP API. +This is the component that serves https://archive.softwareheritage.org/, which is the +window into the entire archive, as it provides access to it through a web browser or the +HTTP API. -It does so by querying most of the internal APIs of |swh|: -the Data Storage (to display source code repositories and their content), -the Scheduler (to allow manual scheduling of loader tasks through the -`Save Code Now `_ feature), -and many of the other services we will see below. +It does so by querying most of the internal APIs of |swh|: the Data Storage (to display +source code repositories and their content), the Scheduler (to allow manual scheduling +of loader tasks through the :swh_web:`Save Code Now ` feature), and many of the +other services we will see below. Internal data mining ^^^^^^^^^^^^^^^^^^^^ @@ -196,10 +195,10 @@ Counters ^^^^^^^^ -The `archive's landing page `_ features -counts of the total number of files/directories/revisions/... in the archive. -Perhaps surprisingly, counting unique objects at |swh|'s scale is hard, -and a performance bottleneck when implemented purely in the Storage's SQL database. +The :swh_web:`archive's landing page ` features counts of the total number of +files/directories/revisions/... in the archive. Perhaps surprisingly, counting unique +objects at |swh|'s scale is hard, and a performance bottleneck when implemented purely +in the Storage's SQL database. :ref:`swh-counters ` provides an alternative design to solve this issue, by reading new objects from the Journal and counting them using Redis_' HyperLogLog_ diff --git a/docs/faq/index.rst b/docs/faq/index.rst --- a/docs/faq/index.rst +++ b/docs/faq/index.rst @@ -221,7 +221,7 @@ Is there a page where I can see all the API endpoints? ------------------------------------------------------ -See the `API endpoint listing page`_. +See the :swh_web:`API endpoint listing page `. What are the usage limits for SWH APIs? --------------------------------------- @@ -231,7 +231,7 @@ * 120 for anonymous users * 1200 for authenticated users -It's described in the `rate limit documentation page`_. +It's described in the :swh_web:`rate limit documentation page `. .. It's temporarily here but it should be moved into its own sphinx instance at some point in the future. @@ -261,8 +261,6 @@ volume, that may need human intervention. -.. _API endpoint listing page: https://archive.softwareheritage.org/api/1/ -.. _rate limit documentation page: https://archive.softwareheritage.org/api/#rate-limiting .. _bug tracking system: https://forge.softwareheritage.org/ .. _contact form: https://www.softwareheritage.org/contact/ .. _contact us: https://www.softwareheritage.org/contact/ diff --git a/docs/getting-started/api.rst b/docs/getting-started/api.rst --- a/docs/getting-started/api.rst +++ b/docs/getting-started/api.rst @@ -15,21 +15,17 @@ organizing, preserving and sharing all the source code publicly available in the world. -Yes, you read it well: all source code available in the world. It implies to -build an equally impressive infrastructure to hold the huge amount of -information represented, make the archive available to the public -through a `nice web interface `__ -and even propose a :ref:`well-documented API ` to access it -seamlessly. For the records, there are also :ref:`various datasets -available ` for download, with detailed instructions -about how to set it up. And, yes it’s huge: the full graph generated -from the archive (with only metadata, content is not included) has more -than 20b nodes and weights 1.2TB. Overall size of the archive is in the -hundreds of TBs. - -This article presents, and demonstrates the use of, the `Software -Heritage API `__ to query -basic information about archived content and fetch the content of a +Yes, all source code available in the world. It implies to build an equally impressive +infrastructure to hold the huge amount of information represented, make the archive +available to the public through a :swh_web:`nice web interface ` and even propose a +:ref:`well-documented API ` to access it seamlessly. For the records, there are +also :ref:`various datasets available ` for download, with detailed +instructions about how to set it up. And, yes it’s huge: the full graph generated from +the archive (with only metadata, content is not included) has more than 20b nodes and +weights 1.2TB. Overall size of the archive is in the hundreds of TBs. + +This article presents, and demonstrates the use of, the :swh_web:`Software Heritage API +` to query basic information about archived content and fetch the content of a software project. Terms and Concepts @@ -91,24 +87,20 @@ print(json.dumps(obj, sort_keys=True, indent=4)) -The syntax mentioned in the `API -documentation `__ is rather -straightforward. Since we want to read it from the main Software -Heritage server, we will use ``https://archive.softwareheritage.org/`` -as the basename. All API calls will be forged according to the same -syntax: +The syntax mentioned in the :swh_web:`API documentation ` is rather +straightforward. Since we want to read it from the main Software Heritage server, we +will use ``https://archive.softwareheritage.org/`` as the basename. All API calls will +be forged according to the same syntax: ``https://archive.softwareheritage.org/api/1/`` Request basic Information ------------------------- -We want to get some basic information about the main server activity and -content. The ``stat`` endpoint provides a summary of the main indexes and -some statistics about the archive. We can request a GET on the main -counters of the archive using the counters path, as described in the -`endpoint -documentation `__: +We want to get some basic information about the main server activity and content. The +``stat`` endpoint provides a summary of the main indexes and some statistics about the +archive. We can request a GET on the main counters of the archive using the counters +path, as described in the :swh_web:`endpoint documentation `: ``/api/1/stat/counters/`` @@ -164,11 +156,9 @@ Search for a keyword ^^^^^^^^^^^^^^^^^^^^ -The easiest way to look for a keyword in the repositories analysed by -the archive is to use the ``search`` feature of the ``origin`` endpoint. -Documentation for the endpoint is -`here `__ -and the complete syntax is: +The easiest way to look for a keyword in the repositories analysed by the archive is to +use the ``search`` feature of the ``origin`` endpoint. Documentation for the endpoint is +:swh_web:`here ` and the complete syntax is: ``/api/1/origin/search//`` @@ -226,11 +216,10 @@ Search for a specific origin ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Now say that we want to query the database for the specific repository -of Alambic, to know what information has been registered by the archive. -The API endpoint can be found `in the swh-web -documentation `__, -and has the following syntax: +Now say that we want to query the database for the specific repository of Alambic, to +know what information has been registered by the archive. The API endpoint can be found +:swh_web:`in the swh-web documentation `, and has the following +syntax: ``/api/1/origin//get/`` @@ -254,7 +243,7 @@ jprint(found) -.. parsed-literal:: +.. code:: { "origin_visits_url": "https://archive.softwareheritage.org/api/1/origin/https://github.com/borisbaldassari/alambic/visits/", @@ -265,11 +254,10 @@ Get visits information ^^^^^^^^^^^^^^^^^^^^^^ -We can use the ``origin_visits_url`` attribute to know more about when -the repository was analysed by the archive bots. The API endpoint is -fully documented on the `Software Heritage doc -site `__, -and has the following syntax: +We can use the ``origin_visits_url`` attribute to know more about when the repository +was analysed by the archive bots. The API endpoint is fully documented on the +:swh_web:`Software Heritage doc site `, and has the following +syntax: ``/api/1/origin//visits/`` @@ -288,7 +276,7 @@ jprint(found[0]) -.. parsed-literal:: +.. code:: Number of visits found: 5. With dates: @@ -327,32 +315,28 @@ print(f"Snapshot is {format(snapshot)}.") -.. parsed-literal:: +.. code:: Snapshot is 6436d2c9b06cf9bd9efb0b4e463c3fe6b868eadc. -Note that the latest visit to the repository can also be directly -retrieved using the `dedicated -endpoint `__ +Note that the latest visit to the repository can also be directly retrieved using the +:swh_web:`dedicated endpoint ` ``/api/1/origin/visit/latest/``. Get the snapshot ^^^^^^^^^^^^^^^^ -We want now to retrieve the content of the project at this snapshot. For -that purpose there is the ``snapshot`` endpoint, and its documentation -is `provided -here `__. The -complete syntax is: +We want now to retrieve the content of the project at this snapshot. For that purpose +there is the ``snapshot`` endpoint, and its documentation is :swh_web:`provided here +`. The complete syntax is: ``/api/1/snapshot//`` -The snapshot endpoint returns in the ``branches`` attribute a list of -**revisions** (aka commits in a git context), which -themselves point to the set of directories and files in the branch at -the time of analysis. Let’s follow this chain of links, starting with -the snapshot’s list of revisions (branches): +The snapshot endpoint returns in the ``branches`` attribute a list of **revisions** (aka +commits in a git context), which themselves point to the set of directories and files in +the branch at the time of analysis. Let’s follow this chain of links, starting with the +snapshot’s list of revisions (branches): .. code:: python @@ -361,7 +345,7 @@ jprint(snapshotj) -.. parsed-literal:: +.. code:: { "branches": { @@ -424,7 +408,7 @@ jprint(masterj) -.. parsed-literal:: +.. code:: Revision ID is 6436d2c9b06cf9bd9efb0b4e463c3fe6b868eadc { @@ -460,12 +444,10 @@ } -The revision references the root directory of the project. We can -list all files and directories at the root by requesting more -information from the ``directory_url`` attribute. The endpoint is -documented -`here `__ and -has the following syntax: +The revision references the root directory of the project. We can list all files and +directories at the root by requesting more information from the ``directory_url`` +attribute. The endpoint is documented :swh_web:`here ` and has the +following syntax: ``/api/1/directory//`` @@ -516,7 +498,7 @@ print(f"- {f['name']}.") -.. parsed-literal:: +.. code:: - .dockerignore - .env @@ -560,10 +542,10 @@ ``/api/1/vault/directory//`` -The first POST request initiates the cooking, and subsequent GET -requests can fetch the job result and download the archive. See the -`Software Heritage documentation ` on this, with useful -examples. The API endpoint is documented `here `__. +The first POST request initiates the cooking, and subsequent GET requests can fetch the +job result and download the archive. See the `Software Heritage documentation +` on this, with useful examples. The API endpoint is documented +:swh_web:`here `. In this example we will fetch the content of the root directory that we previously identified. @@ -575,7 +557,7 @@ jprint(mealj) -.. parsed-literal:: +.. code:: { "fetch_url": "https://archive.softwareheritage.org/api/1/vault/directory/3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/raw/", @@ -600,7 +582,7 @@ jprint(statusj) -.. parsed-literal:: +.. code:: { "fetch_url": "https://archive.softwareheritage.org/api/1/vault/directory/3ee1366c6dd0b7f4ba9536e9bcc300236ac8f200/raw/", @@ -645,12 +627,11 @@ Conclusion ---------- -In this article, we learned **how to explore and use the Software -Heritage archive using its API**: searching for a repository, -identifying projects and downloading specific snapshots of a repository. -There is a lot more to the Archive and its API than what we have seen, -and all features are generously documented on the `Software Heritage web -site `__. +In this article, we learned **how to explore and use the Software Heritage archive using +its API**: searching for a repository, identifying projects and downloading specific +snapshots of a repository. There is a lot more to the Archive and its API than what we +have seen, and all features are generously documented on the :swh_web:`Software Heritage +web site `.