Differential D1460 Diff 5059 api/docs/api.rst

Changeset View

Standalone View

api/docs/api.rst

This file was added.

				Graph traversal API
				zackUnsubmitted Done Inline Actions Minor point: the fact it's compressed is an implementation detail. What we are defining here is a generic API to efficiently run graph algos. I think that most of them will be traversal graphs, so this is likely gonna be the "Graph traversal API", and can be further generalized later if need be. zack: Minor point: the fact it's compressed is an implementation detail. What we are defining here is…
				===================

				Visit
				-----

				.. http:get:: /graph/visit/:swh_id

				Performs a graph traversal and returns explored paths (in the order of the
				traversal).

				:param string swh_id: starting point for the traversal, represented as a
				`Software Heritage persitent identifier
				zackUnsubmitted Done Inline Actions minor: this deserve an hyperlink to the PID doc zack: minor: this deserve an hyperlink to the PID doc
				<https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#persistent-identifiers>`_
				zackUnsubmitted Done Inline Actions Various naming inconsistencies here: "file" should be "content" or "cnt" for short we are mixing short and long version of object types/names, e.g., "dir" and "rev" v. "snapshot" and "release" cc: @seirl zack: Various naming inconsistencies here: - "file" should be "content" or "cnt" for short - we are…

				zackUnsubmitted Done Inline Actions The notion of dataset will be gone soon: we have it now because we compressed different subgraphs separately, but we really want to compress the entire graph at once (as it's needed for cross-subgraph traversals). As a result all subgraph-specific aspects of the API should be generalized. A first proposal to do so is to replace all "SRC_to_DST" strings with "/SRC/DST" URL paths, e.g., "dir_to_rev" → "/dir/rev" fixing the implicit notion that the first is the source, the second the destination. They're not strictly speaking hierarchical, but it seems way nicer than `?from=SRC&to=DST`. Not sure if there are better/more idiomatic REST patterns for this. (It might be useful to look around and check other existing REST APIs for traversals in existing graph databases.) zack: The notion of dataset will be gone soon: we have it now because we compressed different…
				:query string edges: edges types the traversal can follow, expressed as a
				zackUnsubmitted Done Inline Actions I don't see how "colon-separated list of node types" is expressive enough. We need to specify a list of edge type, and each edge type is a pair of node types. So I think this should be something like "comma-separated list of src:dst strings", otherwise you can only specify one path type (possibly crossing multiple types of edges), but no more than that. zack: I don't see how "colon-separated list of node types" is expressive enough. We need to specify a…
				comma-separated list of ``src:dst`` strings (e.g.: ``"rev:rev,rev:dir"``).
				zackUnsubmitted Done Inline Actions this can be made more hierarchic to reduce duplication of information, e.g.: counts: { nodes: ...; edges: ...; }; indegree: { max: ...; min: ...; } etc. zack: this can be made more hierarchic to reduce duplication of information, e.g.: ``` counts: {…
				If no string is provided, the traversal will use all types of edges.
				:query string traversal: default to ``dfs``
				:query string direction: can be either ``forward`` or ``backward``, default
				to ``forward``

				:statuscode 200: no error
				:statuscode 400: an invalid query string has been provided
				:statuscode 404: requested starting node cannot be found in the graph

				.. sourcecode:: http

				HTTP/1.1 200 OK
				Content-Type: application/json

				{
				zackUnsubmitted Done Inline Actions {space,time,spacetime}-recursive are our use cases, but what the API is offering here are just endpoints to visit the graph. Additionally, the other use cases not listed yet (e.g., provenance lookups) are still visits, they just happen to traverse the graph in the opposite direction. Last, it seems to me that the only differentiating point between the various visits are the edges that we allow ourselves to cross during the visit. If we make that a parameter of the API, we essentially obtain an API with a single "visit" method, which is quite elegant. All things considered, how about a method like this: `/graph/visit/SRC_TYPE/DST_TYPE[?direction={forward,backward}]` with `direction` defaulting to `forward`. I realized this is able to deal only with the case where we allow ourselves to only traverse one type of edges. It could be generalized by an optional parameter to specify which additional edges we allow ourselves to traverse, e.g., as a list of "SRC_TYPE/DST_TYPE" strings. Once we have that, we might add shortcut methods for common use cases, e.g., visiting `rev→dir→{dir,cnt}` subgraphs. They will just be syntactical shortcuts that require no ad-hoc implementation. zack: {space,time,spacetime}-recursive are our use cases, but what the API is offering here are just…
				"paths": [
				[
				"swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2",
				...
				],
				[
				"swh:1:dir:d198bc9d7a6bcf6db04f476d29314f157507d505",
				...
				],
				...
				]
				}

				Stats
				vlorentzUnsubmitted Done Inline Actions It could be useful to also provide their hash. vlorentz: It could be useful to also provide their hash.
				zackUnsubmitted Done Inline Actions Yep, a list of paths will not do here. We need a list of node identifiers. A node identifier is currently a pair, <node type, sha1>. I'm not clear on where/how to include full paths. In general visits they can become very long, potentially up to the diameter of the subgraph explored by the visit. Maybe it should be an optional additional attribute, defaulting to false. zack: Yep, a list of paths will not do here. We need a list of node identifiers. A node identifier is…
				haltodeAuthorUnsubmitted Done Inline Actions Yes, by "path" I meant a path of hashes. I only have access to those informations from the graph datasets (.nodes.csv.gz and .edges.csv.gz). haltode: Yes, by "path" I meant a path of hashes. I only have access to those informations from the…
				-----

				.. http:get:: /graph/stats/:src_type/:dst_type

				Returns statistics on the compressed graph.

				:param string src_type: can either be ``dir``, ``ori``, ``rel``, ``rev``, or
				``snp``
				:param string dst_type: can either be ``cnt``, ``dir``, ``obj``, ``rev``, or
				``snp``

				:statuscode 200: no error
				:statuscode 404: requested subgraph cannot be found

				.. sourcecode:: http

				vlorentzUnsubmitted Done Inline Actions same vlorentz: same
				HTTP/1.1 200 OK
				Content-Type: application/json

				{
				"counts": {
				"nodes": 16222788,
				"edges": 9907464
				},
				"ratios": {
				"compression": 0.367,
				"bits_per_node": 5.846,
				"bits_per_edge": 9.573,
				"avg_locality": 270.369
				},
				"indegree": {
				vlorentzUnsubmitted Done Inline Actions What do you mean by "revision logs"? revision messages? vlorentz: What do you mean by "revision logs"? revision messages?
				vlorentzUnsubmitted Done Inline Actions (and if it's messages, then it must be bytes instead of strings; so you may want to use msgpack instead of JSON) vlorentz: (and if it's messages, then it must be bytes instead of strings; so you may want to use msgpack…
				zackUnsubmitted Done Inline Actions We want to have minimal information returned attached to the nodes here: so by default is should just be the revision identifiers. It should be up to the user to go fetch additional info. And, we can support additional parameters that allows the user to request including in the response additional fields (e.g., log messages in this case). zack: We want to have minimal information returned attached to the nodes here: so by default is…
				vlorentzUnsubmitted Done Inline Actions Okay. So it should say "revision ids" instead of "revision logs" vlorentz: Okay. So it should say "revision ids" instead of "revision logs"
				haltodeAuthorUnsubmitted Done Inline Actions Yep, once again I meant to use the revisions identifiers. haltode: Yep, once again I meant to use the revisions identifiers.
				"min": 0,
				"max": 12382,
				"avg": 0.6107127825377487
				},
				"outdegree": {
				"min": 0,
				"max": 1,
				"avg": 0.6107127825377487
				}
				}