Changeset View
Changeset View
Standalone View
Standalone View
docs/api.rst
.. _swh-graph-api: | .. _swh-graph-api: | ||||
Graph RPC API | Graph Querying HTTP API | ||||
============= | ======================= | ||||
The Graph Querying API is a high-level HTTP API intended to run common, | |||||
relatively simple traversal queries on the compressed graph. | |||||
The client/server architecture allows it to only load the graph in memory once | |||||
then serve multiple different requests. However, it is limited in expressivity; | |||||
more complex or resource-intensive queries should rather use the | |||||
:ref:`Low-level Java API <swh-graph-java-api>` to run them as standalone | |||||
programs. | |||||
Terminology | Terminology | ||||
----------- | ----------- | ||||
This API uses the following notions: | This API uses the following notions: | ||||
- **Node**: a node in the :ref:`Software Heritage graph <data-model>`, | - **Node**: a node in the :ref:`Software Heritage graph <data-model>`, | ||||
Show All 35 Lines | |||||
- ``"dir:dir,dir:cnt"`` node types allowing edges from directories to | - ``"dir:dir,dir:cnt"`` node types allowing edges from directories to | ||||
directories nodes, or directories to contents nodes. | directories nodes, or directories to contents nodes. | ||||
- ``"rev:rev,dir:*"`` node types allowing edges from revisions to revisions | - ``"rev:rev,dir:*"`` node types allowing edges from revisions to revisions | ||||
nodes, or from directories nodes. | nodes, or from directories nodes. | ||||
- ``"*:rel"`` node types allowing all edges to releases. | - ``"*:rel"`` node types allowing all edges to releases. | ||||
- ``"cnt,snp"`` accepted node types returned in the query results. | - ``"cnt,snp"`` accepted node types returned in the query results. | ||||
Endpoints | |||||
--------- | |||||
Leaves | Leaves | ||||
------ | ~~~~~~ | ||||
.. http:get:: /graph/leaves/:src | .. http:get:: /graph/leaves/:src | ||||
Performs a graph traversal and returns the leaves of the subgraph rooted at | Performs a graph traversal and returns the leaves of the subgraph rooted at | ||||
the specified source node. | the specified source node. | ||||
:param string src: source node specified as a SWHID | :param string src: source node specified as a SWHID | ||||
Show All 26 Lines | .. sourcecode:: http | ||||
swh:1:cnt:540faad6b1e02e2db4f349a4845192db521ff2bd | swh:1:cnt:540faad6b1e02e2db4f349a4845192db521ff2bd | ||||
swh:1:cnt:630585fc6d34e5e121139e2aee0a64e83dc9aae6 | swh:1:cnt:630585fc6d34e5e121139e2aee0a64e83dc9aae6 | ||||
swh:1:cnt:f8634ced669f0a9155c8cab1b2621d57d778215e | swh:1:cnt:f8634ced669f0a9155c8cab1b2621d57d778215e | ||||
swh:1:cnt:ba6daa801ad3ea587904b1abe9161dceedb2e0bd | swh:1:cnt:ba6daa801ad3ea587904b1abe9161dceedb2e0bd | ||||
... | ... | ||||
Neighbors | Neighbors | ||||
--------- | ~~~~~~~~~ | ||||
.. http:get:: /graph/neighbors/:src | .. http:get:: /graph/neighbors/:src | ||||
Returns node direct neighbors (linked with exactly one edge) in the graph. | Returns node direct neighbors (linked with exactly one edge) in the graph. | ||||
:param string src: source node specified as a SWHID | :param string src: source node specified as a SWHID | ||||
:query string edges: edges types allowed to be listed as neighbors; default | :query string edges: edges types allowed to be listed as neighbors; default | ||||
Show All 24 Lines | .. sourcecode:: http | ||||
swh:1:rev:a31e58e129f73ab5b04016330b13ed51fde7a961 | swh:1:rev:a31e58e129f73ab5b04016330b13ed51fde7a961 | ||||
swh:1:dir:b5d2aa0746b70300ebbca82a8132af386cc5986d | swh:1:dir:b5d2aa0746b70300ebbca82a8132af386cc5986d | ||||
swh:1:rev:52c90f2d32bfa7d6eccd66a56c44ace1f78fbadd | swh:1:rev:52c90f2d32bfa7d6eccd66a56c44ace1f78fbadd | ||||
... | ... | ||||
Walk | Walk | ||||
---- | ~~~~ | ||||
.. | .. | ||||
.. http:get:: /graph/walk/:src/:dst | .. http:get:: /graph/walk/:src/:dst | ||||
Performs a graph traversal and returns the first found path from source to | Performs a graph traversal and returns the first found path from source to | ||||
destination (final destination node included). | destination (final destination node included). | ||||
:param string src: starting node specified as a SWHID | :param string src: starting node specified as a SWHID | ||||
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines | .. sourcecode:: http | ||||
HTTP/1.1 200 OK | HTTP/1.1 200 OK | ||||
swh:1:ori:98aa0e71f5c789b12673717a97f6e9fa20aa1161 | swh:1:ori:98aa0e71f5c789b12673717a97f6e9fa20aa1161 | ||||
swh:1:snp:a511810642b7795e725033febdd82075064ed863 | swh:1:snp:a511810642b7795e725033febdd82075064ed863 | ||||
Visit | Visit | ||||
----- | ~~~~~ | ||||
.. http:get:: /graph/visit/nodes/:src | .. http:get:: /graph/visit/nodes/:src | ||||
.. http:get:: /graph/visit/edges/:src | .. http:get:: /graph/visit/edges/:src | ||||
.. http:get:: /graph/visit/paths/:src | .. http:get:: /graph/visit/paths/:src | ||||
Performs a graph traversal and returns explored nodes, edges or paths (in | Performs a graph traversal and returns explored nodes, edges or paths (in | ||||
the order of the traversal). | the order of the traversal). | ||||
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines | .. sourcecode:: http | ||||
["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb", "swh:1:cnt:2a0837708151d76edf28fdbb90dc3eabc676cff3"] | ["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb", "swh:1:cnt:2a0837708151d76edf28fdbb90dc3eabc676cff3"] | ||||
["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb", "swh:1:cnt:eaf025ad54b94b2fdda26af75594cfae3491ec75"] | ["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb", "swh:1:cnt:eaf025ad54b94b2fdda26af75594cfae3491ec75"] | ||||
... | ... | ||||
["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb", "swh:1:dir:2ebd4b96fa5665ff74f2b27ae41aecdc43af4463", "swh:1:cnt:1d3b6575fb7bf2a147d228e78ffd77ea193c3639"] | ["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb", "swh:1:dir:2ebd4b96fa5665ff74f2b27ae41aecdc43af4463", "swh:1:cnt:1d3b6575fb7bf2a147d228e78ffd77ea193c3639"] | ||||
... | ... | ||||
Counting results | Counting results | ||||
---------------- | ~~~~~~~~~~~~~~~~ | ||||
The following method variants, with trailing `/count` added, behave like their | The following method variants, with trailing `/count` added, behave like their | ||||
already discussed counterparts but, instead of returning results, return the | already discussed counterparts but, instead of returning results, return the | ||||
*amount* of results that would have been returned: | *amount* of results that would have been returned: | ||||
.. http:get:: /graph/leaves/count/:src | .. http:get:: /graph/leaves/count/:src | ||||
Return the amount of :http:get:`/graph/leaves/:src` results | Return the amount of :http:get:`/graph/leaves/:src` results | ||||
.. http:get:: /graph/neighbors/count/:src | .. http:get:: /graph/neighbors/count/:src | ||||
Return the amount of :http:get:`/graph/neighbors/:src` results | Return the amount of :http:get:`/graph/neighbors/:src` results | ||||
.. http:get:: /graph/visit/nodes/count/:src | .. http:get:: /graph/visit/nodes/count/:src | ||||
Return the amount of :http:get:`/graph/visit/nodes/:src` results | Return the amount of :http:get:`/graph/visit/nodes/:src` results | ||||
Stats | Stats | ||||
----- | ~~~~~ | ||||
.. http:get:: /graph/stats | .. http:get:: /graph/stats | ||||
Returns statistics on the compressed graph. | Returns statistics on the compressed graph. | ||||
:statuscode 200: success | :statuscode 200: success | ||||
**Example** | **Example** | ||||
Show All 25 Lines | .. sourcecode:: http | ||||
"avg": 0.6107127825377487 | "avg": 0.6107127825377487 | ||||
}, | }, | ||||
"outdegree": { | "outdegree": { | ||||
"min": 0, | "min": 0, | ||||
"max": 1, | "max": 1, | ||||
"avg": 0.6107127825377487 | "avg": 0.6107127825377487 | ||||
} | } | ||||
} | } | ||||
Use-case examples | |||||
----------------- | |||||
JaredR26: For at least one of the below, I'd give a full example command with full output. Like the… | |||||
Done Inline ActionsFYI, the ORC graph dataset now has an "id" column in the origin table, specifically to convert back from these sha1s to the URLs. It's now very similar to the other nodes, and it's already documented in the documentation of the dataset (which is the correct place to put this, imo) seirl: FYI, the ORC graph dataset now has an "id" column in the origin table, specifically to convert… | |||||
Done Inline ActionsRegarding the output, the above page has a ton of examples already. I just put this here to remove the outdated use-cases page, but it still feels a bit clumsy. Not sure what a better way to present this would be. seirl: Regarding the output, the above page has a ton of examples already. I just put this here to… | |||||
This section showcases how to leverage the endpoints of the HTTP API described | |||||
above for some common use-cases. | |||||
Browsing | |||||
~~~~~~~~ | |||||
The following use cases require traversing the *forward graph*. | |||||
- **ls**: given a directory node, list (non recursively) all linked nodes of | |||||
type directory and content | |||||
Endpoint:: | |||||
/graph/neighbors/:DIR_ID?edges=dir:cnt,dir:dir | |||||
- **ls -R**: given a directory node, recursively list all linked nodes of type | |||||
directory and content | |||||
Endpoint:: | |||||
/graph/visit/paths/:DIR_ID?edges=dir:cnt,dir:dir | |||||
- **git log**: given a revision node, recursively list all linked nodes of type | |||||
revision | |||||
Endpoint:: | |||||
/graph/visit/nodes/:REV_ID?edges=rev:rev | |||||
Vault | |||||
~~~~~ | |||||
The following use cases require traversing the *forward graph*. | |||||
- **tarball** (same as *ls -R* above) | |||||
- **git bundle**: given a node, recursively list all linked nodes of any kind | |||||
Endpoint:: | |||||
/graph/visit/nodes/:NODE_ID?edges=* | |||||
Provenance | |||||
~~~~~~~~~~ | |||||
The following use cases require traversing the *backward (transposed) | |||||
graph*. | |||||
- **commit provenance**: given a content or directory node, return *a* commit | |||||
whose directory (recursively) contains it | |||||
Endpoint:: | |||||
/graph/walk/:NODE_ID/rev?direction=backward&edges=dir:dir,cnt:dir,dir:rev | |||||
- **complete commit provenance**: given a content or directory node, return | |||||
*all* commits whose directory (recursively) contains it | |||||
Endpoint:: | |||||
/graph/leaves/:NODE_ID?direction=backward&edges=dir:dir,cnt:dir,dir:rev | |||||
- **origin provenance**: given a content, directory, or commit node, return | |||||
*an* origin that has at least one snapshot that (recursively) contains it | |||||
Endpoint:: | |||||
/graph/walk/:NODE_ID/ori?direction=backward&edges=* | |||||
- **complete origin provenance**: given a content, directory, or commit node, | |||||
return *all* origins that have at least one snapshot that (recursively) | |||||
contains it | |||||
Endpoint:: | |||||
/graph/leaves/:NODE_ID?direction=backward&edges=* | |||||
Provenance statistics | |||||
~~~~~~~~~~~~~~~~~~~~~ | |||||
The following use cases require traversing the *backward (transposed) | |||||
graph*. | |||||
- **content popularity across commits**: count the number of commits (or | |||||
*commit popularity*) that link to a directory that (recursively) includes a | |||||
given content. | |||||
Endpoint:: | |||||
/graph/count/leaves/:NODE_ID?direction=backward&edges=cnt:dir,dir:dir,dir:rev | |||||
- **commit popularity across origins**: count the number of origins (or *origin | |||||
popularity*) that have a snapshot that (recursively) includes a given commit. | |||||
Endpoint:: | |||||
/graph/count/leaves/:NODE_ID?direction=backward&edges=* | |||||
The following use cases require traversing the *forward graph*. | |||||
- **revision size** (as n. of contents) distribution: the number of contents | |||||
that are (recursively) reachable from a given revision. | |||||
Endpoint:: | |||||
/graph/count/leaves/:NODE_ID?edges=* | |||||
- **origin size** (as n. of revisions) distribution: count the number of | |||||
revisions that are (recursively) reachable from a given origin. | |||||
Endpoint:: | |||||
/graph/count/leaves/:NODE_ID?edges=ori:snp,snp:rel,snp:rev,rel:rev,rev:rev |
For at least one of the below, I'd give a full example command with full output. Like the httpie example in the quickstart. Only needs to be shown once but can help prompt someone who is speeding through.
Also, something somewhere needs to mention the wierdness/problem of the compressed graph wanting the swh:1:ori:HASH identifier. Or does the rpc-api convert uri's given into proper ID's? What does it spit back out if you traverse to an origin?
I can test it on my system but I don't know why it is not liking SWHID's given in the http request.