diff --git a/PKG-INFO b/PKG-INFO
index 3551549..1e30f97 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,52 +1,52 @@
Metadata-Version: 2.1
Name: swh.graph
-Version: 2.1.0
+Version: 2.1.2
Summary: Software Heritage graph service
Home-page: https://forge.softwareheritage.org/diffusion/DGRPH
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-graph
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-graph/
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 3 - Alpha
Requires-Python: >=3.7
Description-Content-Type: text/x-rst
Provides-Extra: testing
License-File: LICENSE
License-File: AUTHORS
Software Heritage - graph service
=================================
Tooling and services, collectively known as ``swh-graph``, providing fast
access to the graph representation of the `Software Heritage
`_
`archive `_. The service is in-memory,
based on a compressed representation of the Software Heritage Merkle DAG.
Bibliography
------------
In addition to accompanying technical documentation, ``swh-graph`` is also
described in the following scientific paper. If you publish results based on
``swh-graph``, please acknowledge it by citing the paper as follows:
.. note::
Paolo Boldi, Antoine Pietri, Sebastiano Vigna, Stefano Zacchiroli.
`Ultra-Large-Scale Repository Analysis via Graph Compression
`_. In proceedings of `SANER
2020 `_: The 27th IEEE International
Conference on Software Analysis, Evolution and Reengineering, pages
184-194. IEEE 2020.
Links: `preprint
`_,
`bibtex
`_.
diff --git a/docs/api.rst b/docs/api.rst
index 9d29c82..9c86a50 100644
--- a/docs/api.rst
+++ b/docs/api.rst
@@ -1,473 +1,477 @@
.. _swh-graph-api:
Graph Querying HTTP API
=======================
The Graph Querying API is a high-level HTTP API intended to run common,
relatively simple traversal queries on the compressed graph.
The client/server architecture allows it to only load the graph in memory once
then serve multiple different requests. However, it is limited in expressivity;
more complex or resource-intensive queries should rather use the
:ref:`Low-level Java API ` to run them as standalone
programs.
Terminology
-----------
This API uses the following notions:
- **Node**: a node in the :ref:`Software Heritage graph `,
represented by a :ref:`SWHID `.
- **Node type**: the 3-letter specifier from the node SWHID (``cnt``, ``dir``,
``rel``, ``rev``, ``snp``, ``ori``), or ``*`` for all node types.
- **Edge type**: a pair ``src:dst`` where ``src`` and ``dst`` are either node
types, or ``*`` to denote all node types.
- **Edge restrictions**: a textual specification of which edges can be followed
during graph traversal. Either ``*`` to denote that all edges can be followed
or a comma separated list of edge types to allow following only those edges.
Note that when traversing the *backward* (i.e., transposed) graph, edge types
are reversed too. So, for instance, ``ori:snp`` makes sense when traversing
the forward graph, but useless (due to lack of matching edges in the graph)
when traversing the backward graph; conversely ``snp:ori`` is useful when
traversing the backward graph, but not in the forward one. For the same
reason ``dir:dir`` allows following edges from parent directories to
sub-directories when traversing the forward graph, but the same restriction
allows following edges from sub-directories to parent directories.
- **Node restrictions**: a textual specification of which type of nodes can be
returned after a request. Either ``*`` to denote that all types of nodes can
be returned or a comma separated list of node types to allow returning only
those node types.
Examples
~~~~~~~~
- ``swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2`` the SWHID of a node of
type content containing the full text of the GPL3 license.
- ``swh:1:rev:f39d7d78b70e0f39facb1e4fab77ad3df5c52a35`` the SWHID of a node of
type revision corresponding to the commit in Linux that merged the
'x86/urgent' branch on 31 December 2017.
- ``"dir:dir,dir:cnt"`` node types allowing edges from directories to
directories nodes, or directories to contents nodes.
- ``"rev:rev,dir:*"`` node types allowing edges from revisions to revisions
nodes, or from directories nodes.
- ``"*:rel"`` node types allowing all edges to releases.
- ``"cnt,snp"`` accepted node types returned in the query results.
Endpoints
---------
Leaves
~~~~~~
.. http:get:: /graph/leaves/:src
Performs a graph traversal and returns the leaves of the subgraph rooted at
the specified source node.
:param string src: source node specified as a SWHID
:query string edges: edges types the traversal can follow; default to
``"*"``
:query string direction: direction in which graph edges will be followed;
can be either ``forward`` or ``backward``, default to ``forward``
:query integer max_edges: how many edges can be traversed during the visit;
default to 0 (not restricted)
:query string return_types: only return the nodes matching this type;
default to ``"*"``
+ :query integer max_matching_nodes: how many results to return before stopping;
+ default to 0 (not restricted)
:statuscode 200: success
:statuscode 400: invalid query string provided
:statuscode 404: starting node cannot be found
**Example:**
.. sourcecode:: http
GET /graph/leaves/swh:1:dir:432d1b21c1256f7408a07c577b6974bbdbcc1323 HTTP/1.1
Content-Type: text/plain
Transfer-Encoding: chunked
.. sourcecode:: http
HTTP/1.1 200 OK
swh:1:cnt:540faad6b1e02e2db4f349a4845192db521ff2bd
swh:1:cnt:630585fc6d34e5e121139e2aee0a64e83dc9aae6
swh:1:cnt:f8634ced669f0a9155c8cab1b2621d57d778215e
swh:1:cnt:ba6daa801ad3ea587904b1abe9161dceedb2e0bd
...
Neighbors
~~~~~~~~~
.. http:get:: /graph/neighbors/:src
Returns node direct neighbors (linked with exactly one edge) in the graph.
:param string src: source node specified as a SWHID
:query string edges: edges types allowed to be listed as neighbors; default
to ``"*"``
:query string direction: direction in which graph edges will be followed;
can be either ``forward`` or ``backward``, default to ``forward``
:query integer max_edges: how many edges can be traversed during the visit;
default to 0 (not restricted)
:query string return_types: only return the nodes matching this type;
default to ``"*"``
:statuscode 200: success
:statuscode 400: invalid query string provided
:statuscode 404: starting node cannot be found
**Example:**
.. sourcecode:: http
GET /graph/neighbors/swh:1:rev:f39d7d78b70e0f39facb1e4fab77ad3df5c52a35 HTTP/1.1
Content-Type: text/plain
Transfer-Encoding: chunked
.. sourcecode:: http
HTTP/1.1 200 OK
swh:1:rev:a31e58e129f73ab5b04016330b13ed51fde7a961
swh:1:dir:b5d2aa0746b70300ebbca82a8132af386cc5986d
swh:1:rev:52c90f2d32bfa7d6eccd66a56c44ace1f78fbadd
...
Walk
~~~~
..
.. http:get:: /graph/walk/:src/:dst
Performs a graph traversal and returns the first found path from source to
destination (final destination node included).
:param string src: starting node specified as a SWHID
:param string dst: destination node, either as a node SWHID or a node
type. The traversal will stop at the first node encountered matching
the desired destination.
:query string edges: edges types the traversal can follow; default to
``"*"``
:query string traversal: traversal algorithm; can be either ``dfs`` or
``bfs``, default to ``dfs``
:query string direction: direction in which graph edges will be followed;
can be either ``forward`` or ``backward``, default to ``forward``
:query string return_types: types of nodes we want to be displayed; default to ``"*"``
:statuscode 200: success
:statuscode 400: invalid query string provided
:statuscode 404: starting node cannot be found
**Example:**
.. sourcecode:: http
HTTP/1.1 200 OK
swh:1:rev:f39d7d78b70e0f39facb1e4fab77ad3df5c52a35
swh:1:rev:52c90f2d32bfa7d6eccd66a56c44ace1f78fbadd
swh:1:rev:cea92e843e40452c08ba313abc39f59efbb4c29c
swh:1:rev:8d517bdfb57154b8a11d7f1682ecc0f79abf8e02
...
Visit
~~~~~
.. http:get:: /graph/visit/nodes/:src
.. http:get:: /graph/visit/edges/:src
.. http:get:: /graph/visit/paths/:src
Performs a graph traversal and returns explored nodes, edges or paths (in
the order of the traversal).
:param string src: starting node specified as a SWHID
:query string edges: edges types the traversal can follow; default to
``"*"``
:query integer max_edges: how many edges can be traversed during the visit;
default to 0 (not restricted)
:query string return_types: only return the nodes matching this type;
default to ``"*"``
+ :query integer max_matching_nodes: how many nodes to return/visit before stopping;
+ default to 0 (not restricted)
:statuscode 200: success
:statuscode 400: invalid query string provided
:statuscode 404: starting node cannot be found
**Example:**
.. sourcecode:: http
GET /graph/visit/nodes/swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc HTTP/1.1
Content-Type: text/plain
Transfer-Encoding: chunked
.. sourcecode:: http
HTTP/1.1 200 OK
swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc
swh:1:rev:cfab784723a6c2d33468c9ed8a566fd5e2abd8c9
swh:1:rev:53e5df0e7a6b7bd4919074c081a173655c0da164
swh:1:rev:f85647f14b8243532283eff3e08f4ee96c35945f
swh:1:rev:fe5f9ef854715fc59b9ec22f9878f11498cfcdbf
swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb
swh:1:cnt:c8cece50beae7a954f4ea27e3ae7bf941dc6d0c0
swh:1:dir:a358d0cf89821227d4c00b0ced5e0a8b3756b5db
swh:1:cnt:cc407b7e24dd300d2e1a77d8f04af89b3f962a51
swh:1:cnt:701bd0a63e11b3390a547ce8515d28c6bab8a201
...
**Example:**
.. sourcecode:: http
GET /graph/visit/edges/swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc HTTP/1.1
Content-Type: text/plain
Transfer-Encoding: chunked
.. sourcecode:: http
HTTP/1.1 200 OK
swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:61f92a7db95f5a6d1fcb94d2b897ed3797584d7b
swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:00e81c89c29ff3e58745fdaf7abb68daa1389e85
swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:7596fdc31c9aa00aed281ccb026a74cabf2383bb
swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:ec7a2341ac3d9d8b571bbdfb90a089d4e54dea56
swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:1c5b5eac61eda2454034a43eb124ab490885ef3a
swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:4dfa88ca55e04e8afe05e8543ddddee32dde7236
swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:d56ae79e43ff1b37534370911c8a78ec7f38d437
swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:19ba5d6203a040a39ecc4a77b165d3f097c1e662
swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:9c56102eefea23c95405533e1de23da4b873ecc4
swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:3f54e816b46c2e179cd164e17fea93b3013a9db4
...
**Example:**
.. sourcecode:: http
GET /graph/visit/paths/swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb HTTP/1.1
Content-Type: application/x-ndjson
Transfer-Encoding: chunked
.. sourcecode:: http
HTTP/1.1 200 OK
["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb", "swh:1:cnt:acfb7cabd63b368a03a9df87670ece1488c8bce0"]
["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb", "swh:1:cnt:2a0837708151d76edf28fdbb90dc3eabc676cff3"]
["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb", "swh:1:cnt:eaf025ad54b94b2fdda26af75594cfae3491ec75"]
...
["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb", "swh:1:dir:2ebd4b96fa5665ff74f2b27ae41aecdc43af4463", "swh:1:cnt:1d3b6575fb7bf2a147d228e78ffd77ea193c3639"]
...
Counting results
~~~~~~~~~~~~~~~~
The following method variants, with trailing `/count` added, behave like their
already discussed counterparts but, instead of returning results, return the
*amount* of results that would have been returned:
.. http:get:: /graph/leaves/count/:src
Return the amount of :http:get:`/graph/leaves/:src` results
.. http:get:: /graph/neighbors/count/:src
Return the amount of :http:get:`/graph/neighbors/:src` results
.. http:get:: /graph/visit/nodes/count/:src
Return the amount of :http:get:`/graph/visit/nodes/:src` results
Stats
~~~~~
.. http:get:: /graph/stats
Returns statistics on the compressed graph.
:statuscode 200: success
**Example**
.. sourcecode:: http
GET /graph/stats HTTP/1.1
Content-Type: application/json
.. sourcecode:: http
HTTP/1.1 200 OK
{
"counts": {
"nodes": 16222788,
"edges": 9907464
},
"ratios": {
"compression": 0.367,
"bits_per_node": 5.846,
"bits_per_edge": 9.573,
"avg_locality": 270.369
},
"indegree": {
"min": 0,
"max": 12382,
"avg": 0.6107127825377487
},
"outdegree": {
"min": 0,
"max": 1,
"avg": 0.6107127825377487
}
}
Use-case examples
-----------------
This section showcases how to leverage the endpoints of the HTTP API described
above for some common use-cases.
Browsing
~~~~~~~~
The following use cases require traversing the *forward graph*.
- **ls**: given a directory node, list (non recursively) all linked nodes of
type directory and content
Endpoint::
/graph/neighbors/:DIR_ID?edges=dir:cnt,dir:dir
- **ls -R**: given a directory node, recursively list all linked nodes of type
directory and content
Endpoint::
/graph/visit/paths/:DIR_ID?edges=dir:cnt,dir:dir
- **git log**: given a revision node, recursively list all linked nodes of type
revision
Endpoint::
/graph/visit/nodes/:REV_ID?edges=rev:rev
Vault
~~~~~
The following use cases require traversing the *forward graph*.
- **tarball** (same as *ls -R* above)
- **git bundle**: given a node, recursively list all linked nodes of any kind
Endpoint::
/graph/visit/nodes/:NODE_ID?edges=*
Provenance
~~~~~~~~~~
The following use cases require traversing the *backward (transposed)
graph*.
- **commit provenance**: given a content or directory node, return *a* commit
whose directory (recursively) contains it
Endpoint::
/graph/walk/:NODE_ID/rev?direction=backward&edges=dir:dir,cnt:dir,dir:rev
- **complete commit provenance**: given a content or directory node, return
*all* commits whose directory (recursively) contains it
Endpoint::
/graph/leaves/:NODE_ID?direction=backward&edges=dir:dir,cnt:dir,dir:rev
- **origin provenance**: given a content, directory, or commit node, return
*an* origin that has at least one snapshot that (recursively) contains it
Endpoint::
/graph/walk/:NODE_ID/ori?direction=backward&edges=*
- **complete origin provenance**: given a content, directory, or commit node,
return *all* origins that have at least one snapshot that (recursively)
contains it
Endpoint::
/graph/leaves/:NODE_ID?direction=backward&edges=*
Provenance statistics
~~~~~~~~~~~~~~~~~~~~~
The following use cases require traversing the *backward (transposed)
graph*.
- **content popularity across commits**: count the number of commits (or
*commit popularity*) that link to a directory that (recursively) includes a
given content.
Endpoint::
/graph/count/leaves/:NODE_ID?direction=backward&edges=cnt:dir,dir:dir,dir:rev
- **commit popularity across origins**: count the number of origins (or *origin
popularity*) that have a snapshot that (recursively) includes a given commit.
Endpoint::
/graph/count/leaves/:NODE_ID?direction=backward&edges=*
The following use cases require traversing the *forward graph*.
- **revision size** (as n. of contents) distribution: the number of contents
that are (recursively) reachable from a given revision.
Endpoint::
/graph/count/leaves/:NODE_ID?edges=*
- **origin size** (as n. of revisions) distribution: count the number of
revisions that are (recursively) reachable from a given origin.
Endpoint::
/graph/count/leaves/:NODE_ID?edges=ori:snp,snp:rel,snp:rev,rel:rev,rev:rev
diff --git a/java/src/main/java/org/softwareheritage/graph/rpc/Traversal.java b/java/src/main/java/org/softwareheritage/graph/rpc/Traversal.java
index 394caf9..94e1acd 100644
--- a/java/src/main/java/org/softwareheritage/graph/rpc/Traversal.java
+++ b/java/src/main/java/org/softwareheritage/graph/rpc/Traversal.java
@@ -1,553 +1,551 @@
/*
* Copyright (c) 2022 The Software Heritage developers
* See the AUTHORS file at the top-level directory of this distribution
* License: GNU General Public License version 3, or any later version
* See top-level LICENSE file for more information
*/
package org.softwareheritage.graph.rpc;
import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator;
import it.unimi.dsi.big.webgraph.labelling.Label;
import org.softwareheritage.graph.*;
import java.util.*;
/** Traversal contains all the algorithms used for graph traversals */
public class Traversal {
/**
* Wrapper around g.successors(), only follows edges that are allowed by the given
* {@link AllowedEdges} object.
*/
private static ArcLabelledNodeIterator.LabelledArcIterator filterLabelledSuccessors(SwhUnidirectionalGraph g,
long nodeId, AllowedEdges allowedEdges) {
if (allowedEdges.restrictedTo == null) {
// All edges are allowed, bypass edge check
return g.labelledSuccessors(nodeId);
} else {
ArcLabelledNodeIterator.LabelledArcIterator allSuccessors = g.labelledSuccessors(nodeId);
return new ArcLabelledNodeIterator.LabelledArcIterator() {
@Override
public Label label() {
return allSuccessors.label();
}
@Override
public long nextLong() {
long neighbor;
while ((neighbor = allSuccessors.nextLong()) != -1) {
if (allowedEdges.isAllowed(g.getNodeType(nodeId), g.getNodeType(neighbor))) {
return neighbor;
}
}
return -1;
}
@Override
public long skip(final long n) {
long i = 0;
while (i < n && nextLong() != -1)
i++;
return i;
}
};
}
}
/** Helper class to check that a given node is "valid" for some given {@link NodeFilter} */
private static class NodeFilterChecker {
private final SwhUnidirectionalGraph g;
private final NodeFilter filter;
private final AllowedNodes allowedNodes;
private NodeFilterChecker(SwhUnidirectionalGraph graph, NodeFilter filter) {
this.g = graph;
this.filter = filter;
this.allowedNodes = new AllowedNodes(filter.hasTypes() ? filter.getTypes() : "*");
}
public boolean allowed(long nodeId) {
if (filter == null) {
return true;
}
if (!this.allowedNodes.isAllowed(g.getNodeType(nodeId))) {
return false;
}
return true;
}
}
/** Returns the unidirectional graph from a bidirectional graph and a {@link GraphDirection}. */
public static SwhUnidirectionalGraph getDirectedGraph(SwhBidirectionalGraph g, GraphDirection direction) {
switch (direction) {
case FORWARD:
return g.getForwardGraph();
case BACKWARD:
return g.getBackwardGraph();
/*
* TODO: add support for BOTH case BOTH: return new SwhUnidirectionalGraph(g.symmetrize(),
* g.getProperties());
*/
default :
throw new IllegalArgumentException("Unknown direction: " + direction);
}
}
/** Returns the opposite of a given {@link GraphDirection} (equivalent to a graph transposition). */
public static GraphDirection reverseDirection(GraphDirection direction) {
switch (direction) {
case FORWARD:
return GraphDirection.BACKWARD;
case BACKWARD:
return GraphDirection.FORWARD;
/*
* TODO: add support for BOTH case BOTH: return GraphDirection.BOTH;
*/
default :
throw new IllegalArgumentException("Unknown direction: " + direction);
}
}
/** Dummy exception to short-circuit and interrupt a graph traversal. */
static class StopTraversalException extends RuntimeException {
}
/** Generic BFS traversal algorithm. */
static class BFSVisitor {
/** The graph to traverse. */
protected final SwhUnidirectionalGraph g;
/** Depth of the node currently being visited */
protected long depth = 0;
/**
* Number of traversal successors (i.e., successors that will be considered by the traversal) of the
* node currently being visited
*/
protected long traversalSuccessors = 0;
/** Number of edges accessed since the beginning of the traversal */
protected long edgesAccessed = 0;
/**
* Map from a node ID to its parent node ID. The key set can be used as the set of all visited
* nodes.
*/
protected HashMap parents = new HashMap<>();
/** Queue of nodes to visit (also called "frontier", "open set", "wavefront" etc.) */
protected ArrayDeque queue = new ArrayDeque<>();
/** If > 0, the maximum depth of the traversal. */
private long maxDepth = -1;
/** If > 0, the maximum number of edges to traverse. */
private long maxEdges = -1;
BFSVisitor(SwhUnidirectionalGraph g) {
this.g = g;
}
/** Add a new source node to the initial queue. */
public void addSource(long nodeId) {
queue.add(nodeId);
parents.put(nodeId, -1L);
}
/** Set the maximum depth of the traversal. */
public void setMaxDepth(long depth) {
maxDepth = depth;
}
/** Set the maximum number of edges to traverse. */
public void setMaxEdges(long edges) {
maxEdges = edges;
}
/** Setup the visit counters and depth sentinel. */
public void visitSetup() {
edgesAccessed = 0;
depth = 0;
queue.add(-1L); // depth sentinel
}
/** Perform the visit */
public void visit() {
visitSetup();
while (!queue.isEmpty()) {
visitStep();
}
}
/** Single "step" of a visit. Advance the frontier of exactly one node. */
public void visitStep() {
try {
assert !queue.isEmpty();
long curr = queue.poll();
if (curr == -1L) {
++depth;
if (!queue.isEmpty()) {
queue.add(-1L);
visitStep();
}
return;
}
if (maxDepth >= 0 && depth > maxDepth) {
throw new StopTraversalException();
}
edgesAccessed += g.outdegree(curr);
if (maxEdges >= 0 && edgesAccessed > maxEdges) {
throw new StopTraversalException();
}
visitNode(curr);
} catch (StopTraversalException e) {
// Traversal is over, clear the to-do queue.
queue.clear();
}
}
/**
* Get the successors of a node. Override this function if you want to filter which successors are
* considered during the traversal.
*/
protected ArcLabelledNodeIterator.LabelledArcIterator getSuccessors(long nodeId) {
return g.labelledSuccessors(nodeId);
}
/** Visit a node. Override to do additional processing on the node. */
protected void visitNode(long node) {
ArcLabelledNodeIterator.LabelledArcIterator it = getSuccessors(node);
traversalSuccessors = 0;
for (long succ; (succ = it.nextLong()) != -1;) {
traversalSuccessors++;
visitEdge(node, succ, it.label());
}
}
/** Visit an edge. Override to do additional processing on the edge. */
protected void visitEdge(long src, long dst, Label label) {
if (!parents.containsKey(dst)) {
queue.add(dst);
parents.put(dst, src);
}
}
}
/**
* SimpleTraversal is used by the Traverse endpoint. It extends BFSVisitor with additional
* processing, notably related to graph properties and filters.
*/
static class SimpleTraversal extends BFSVisitor {
private final NodeFilterChecker nodeReturnChecker;
private final AllowedEdges allowedEdges;
private final TraversalRequest request;
private final NodePropertyBuilder.NodeDataMask nodeDataMask;
private final NodeObserver nodeObserver;
private long remainingMatches;
private Node.Builder nodeBuilder;
SimpleTraversal(SwhBidirectionalGraph bidirectionalGraph, TraversalRequest request, NodeObserver nodeObserver) {
super(getDirectedGraph(bidirectionalGraph, request.getDirection()));
this.request = request;
this.nodeObserver = nodeObserver;
this.nodeReturnChecker = new NodeFilterChecker(g, request.getReturnNodes());
this.nodeDataMask = new NodePropertyBuilder.NodeDataMask(request.hasMask() ? request.getMask() : null);
this.allowedEdges = new AllowedEdges(request.hasEdges() ? request.getEdges() : "*");
request.getSrcList().forEach(srcSwhid -> {
long srcNodeId = g.getNodeId(new SWHID(srcSwhid));
addSource(srcNodeId);
});
if (request.hasMaxDepth()) {
setMaxDepth(request.getMaxDepth());
}
if (request.hasMaxEdges()) {
setMaxEdges(request.getMaxEdges());
}
if (request.hasMaxMatchingNodes() && request.getMaxMatchingNodes() > 0) {
this.remainingMatches = request.getMaxMatchingNodes();
} else {
this.remainingMatches = -1;
}
}
@Override
protected ArcLabelledNodeIterator.LabelledArcIterator getSuccessors(long nodeId) {
return filterLabelledSuccessors(g, nodeId, allowedEdges);
}
@Override
public void visitNode(long node) {
nodeBuilder = null;
if (nodeReturnChecker.allowed(node) && (!request.hasMinDepth() || depth >= request.getMinDepth())) {
nodeBuilder = Node.newBuilder();
NodePropertyBuilder.buildNodeProperties(g, nodeDataMask, nodeBuilder, node);
}
super.visitNode(node);
- boolean nodeMatchesConstraints = true;
-
- if (request.getReturnNodes().hasMinTraversalSuccessors()) {
- nodeMatchesConstraints &= traversalSuccessors >= request.getReturnNodes().getMinTraversalSuccessors();
+ if (request.getReturnNodes().hasMinTraversalSuccessors()
+ && traversalSuccessors < request.getReturnNodes().getMinTraversalSuccessors()) {
+ nodeBuilder = null;
}
- if (request.getReturnNodes().hasMaxTraversalSuccessors()) {
- nodeMatchesConstraints &= traversalSuccessors <= request.getReturnNodes().getMaxTraversalSuccessors();
+ if (request.getReturnNodes().hasMaxTraversalSuccessors()
+ && traversalSuccessors > request.getReturnNodes().getMaxTraversalSuccessors()) {
+ nodeBuilder = null;
}
- if (nodeMatchesConstraints) {
- if (nodeBuilder != null) {
- nodeObserver.onNext(nodeBuilder.build());
- }
+ if (nodeBuilder != null) {
+ nodeObserver.onNext(nodeBuilder.build());
if (remainingMatches >= 0) {
remainingMatches--;
if (remainingMatches == 0) {
// We matched as many nodes as allowed
throw new StopTraversalException();
}
}
}
}
@Override
protected void visitEdge(long src, long dst, Label label) {
super.visitEdge(src, dst, label);
NodePropertyBuilder.buildSuccessorProperties(g, nodeDataMask, nodeBuilder, src, dst, label);
}
}
/**
* FindPathTo searches for a path from a source node to a node matching a given criteria It extends
* BFSVisitor with additional processing, and makes the traversal stop as soon as a node matching
* the given criteria is found.
*/
static class FindPathTo extends BFSVisitor {
private final AllowedEdges allowedEdges;
private final FindPathToRequest request;
private final NodePropertyBuilder.NodeDataMask nodeDataMask;
private final NodeFilterChecker targetChecker;
private Long targetNode = null;
FindPathTo(SwhBidirectionalGraph bidirectionalGraph, FindPathToRequest request) {
super(getDirectedGraph(bidirectionalGraph, request.getDirection()));
this.request = request;
this.targetChecker = new NodeFilterChecker(g, request.getTarget());
this.nodeDataMask = new NodePropertyBuilder.NodeDataMask(request.hasMask() ? request.getMask() : null);
this.allowedEdges = new AllowedEdges(request.hasEdges() ? request.getEdges() : "*");
if (request.hasMaxDepth()) {
setMaxDepth(request.getMaxDepth());
}
if (request.hasMaxEdges()) {
setMaxEdges(request.getMaxEdges());
}
request.getSrcList().forEach(srcSwhid -> {
long srcNodeId = g.getNodeId(new SWHID(srcSwhid));
addSource(srcNodeId);
});
}
@Override
protected ArcLabelledNodeIterator.LabelledArcIterator getSuccessors(long nodeId) {
return filterLabelledSuccessors(g, nodeId, allowedEdges);
}
@Override
public void visitNode(long node) {
if (targetChecker.allowed(node)) {
targetNode = node;
throw new StopTraversalException();
}
super.visitNode(node);
}
/**
* Once the visit has been performed and a matching node has been found, return the shortest path
* from the source set to that node. To do so, we need to backtrack the parents of the node until we
* find one of the source nodes (whose parent is -1).
*/
public Path getPath() {
if (targetNode == null) {
return null; // No path found.
}
/* Backtrack from targetNode to a source node */
long curNode = targetNode;
ArrayList path = new ArrayList<>();
while (curNode != -1) {
path.add(curNode);
curNode = parents.get(curNode);
}
Collections.reverse(path);
/* Enrich path with node properties */
Path.Builder pathBuilder = Path.newBuilder();
for (long nodeId : path) {
Node.Builder nodeBuilder = Node.newBuilder();
NodePropertyBuilder.buildNodeProperties(g, nodeDataMask, nodeBuilder, nodeId);
pathBuilder.addNode(nodeBuilder.build());
}
return pathBuilder.build();
}
}
/**
* FindPathBetween searches for a shortest path between a set of source nodes and a set of
* destination nodes.
*
* It does so by performing a *bidirectional breadth-first search*, i.e., two parallel breadth-first
* searches, one from the source set ("src-BFS") and one from the destination set ("dst-BFS"), until
* both searches find a common node that joins their visited sets. This node is called the "midpoint
* node". The path returned is the path src -> ... -> midpoint -> ... -> dst, which is always a
* shortest path between src and dst.
*
* The graph direction of both BFS can be configured separately. By default, the dst-BFS will use
* the graph in the opposite direction than the src-BFS (if direction = FORWARD, by default
* direction_reverse = BACKWARD, and vice-versa). The default behavior is thus to search for a
* shortest path between two nodes in a given direction. However, one can also specify FORWARD or
* BACKWARD for *both* the src-BFS and the dst-BFS. This will search for a common descendant or a
* common ancestor between the two sets, respectively. These will be the midpoints of the returned
* path.
*/
static class FindPathBetween extends BFSVisitor {
private final FindPathBetweenRequest request;
private final NodePropertyBuilder.NodeDataMask nodeDataMask;
private final AllowedEdges allowedEdgesSrc;
private final AllowedEdges allowedEdgesDst;
private final BFSVisitor srcVisitor;
private final BFSVisitor dstVisitor;
private Long middleNode = null;
FindPathBetween(SwhBidirectionalGraph bidirectionalGraph, FindPathBetweenRequest request) {
super(getDirectedGraph(bidirectionalGraph, request.getDirection()));
this.request = request;
this.nodeDataMask = new NodePropertyBuilder.NodeDataMask(request.hasMask() ? request.getMask() : null);
GraphDirection direction = request.getDirection();
// if direction_reverse is not specified, use the opposite direction of direction
GraphDirection directionReverse = request.hasDirectionReverse()
? request.getDirectionReverse()
: reverseDirection(request.getDirection());
SwhUnidirectionalGraph srcGraph = getDirectedGraph(bidirectionalGraph, direction);
SwhUnidirectionalGraph dstGraph = getDirectedGraph(bidirectionalGraph, directionReverse);
this.allowedEdgesSrc = new AllowedEdges(request.hasEdges() ? request.getEdges() : "*");
/*
* If edges_reverse is not specified: - If `edges` is not specified either, defaults to "*" - If
* direction == direction_reverse, defaults to `edges` - If direction != direction_reverse, defaults
* to the reverse of `edges` (e.g. "rev:dir" becomes "dir:rev").
*/
this.allowedEdgesDst = request.hasEdgesReverse()
? new AllowedEdges(request.getEdgesReverse())
: (request.hasEdges()
? (direction == directionReverse
? new AllowedEdges(request.getEdges())
: new AllowedEdges(request.getEdges()).reverse())
: new AllowedEdges("*"));
/*
* Source sub-visitor. Aborts as soon as it finds a node already visited by the destination
* sub-visitor.
*/
this.srcVisitor = new BFSVisitor(srcGraph) {
@Override
protected ArcLabelledNodeIterator.LabelledArcIterator getSuccessors(long nodeId) {
return filterLabelledSuccessors(g, nodeId, allowedEdgesSrc);
}
@Override
public void visitNode(long node) {
if (dstVisitor.parents.containsKey(node)) {
middleNode = node;
throw new StopTraversalException();
}
super.visitNode(node);
}
};
/*
* Destination sub-visitor. Aborts as soon as it finds a node already visited by the source
* sub-visitor.
*/
this.dstVisitor = new BFSVisitor(dstGraph) {
@Override
protected ArcLabelledNodeIterator.LabelledArcIterator getSuccessors(long nodeId) {
return filterLabelledSuccessors(g, nodeId, allowedEdgesDst);
}
@Override
public void visitNode(long node) {
if (srcVisitor.parents.containsKey(node)) {
middleNode = node;
throw new StopTraversalException();
}
super.visitNode(node);
}
};
if (request.hasMaxDepth()) {
this.srcVisitor.setMaxDepth(request.getMaxDepth());
this.dstVisitor.setMaxDepth(request.getMaxDepth());
}
if (request.hasMaxEdges()) {
this.srcVisitor.setMaxEdges(request.getMaxEdges());
this.dstVisitor.setMaxEdges(request.getMaxEdges());
}
request.getSrcList().forEach(srcSwhid -> {
long srcNodeId = g.getNodeId(new SWHID(srcSwhid));
srcVisitor.addSource(srcNodeId);
});
request.getDstList().forEach(srcSwhid -> {
long srcNodeId = g.getNodeId(new SWHID(srcSwhid));
dstVisitor.addSource(srcNodeId);
});
}
@Override
public void visit() {
/*
* Bidirectional BFS: maintain two sub-visitors, and alternately run a visit step in each of them.
*/
srcVisitor.visitSetup();
dstVisitor.visitSetup();
while (!srcVisitor.queue.isEmpty() || !dstVisitor.queue.isEmpty()) {
if (!srcVisitor.queue.isEmpty()) {
srcVisitor.visitStep();
}
if (!dstVisitor.queue.isEmpty()) {
dstVisitor.visitStep();
}
}
}
public Path getPath() {
if (middleNode == null) {
return null; // No path found.
}
Path.Builder pathBuilder = Path.newBuilder();
ArrayList path = new ArrayList<>();
/* First section of the path: src -> midpoint */
long curNode = middleNode;
while (curNode != -1) {
path.add(curNode);
curNode = srcVisitor.parents.get(curNode);
}
pathBuilder.setMidpointIndex(path.size() - 1);
Collections.reverse(path);
/* Second section of the path: midpoint -> dst */
curNode = dstVisitor.parents.get(middleNode);
while (curNode != -1) {
path.add(curNode);
curNode = dstVisitor.parents.get(curNode);
}
/* Enrich path with node properties */
for (long nodeId : path) {
Node.Builder nodeBuilder = Node.newBuilder();
NodePropertyBuilder.buildNodeProperties(g, nodeDataMask, nodeBuilder, nodeId);
pathBuilder.addNode(nodeBuilder.build());
}
return pathBuilder.build();
}
}
public interface NodeObserver {
void onNext(Node nodeId);
}
}
diff --git a/swh.graph.egg-info/PKG-INFO b/swh.graph.egg-info/PKG-INFO
index 3551549..1e30f97 100644
--- a/swh.graph.egg-info/PKG-INFO
+++ b/swh.graph.egg-info/PKG-INFO
@@ -1,52 +1,52 @@
Metadata-Version: 2.1
Name: swh.graph
-Version: 2.1.0
+Version: 2.1.2
Summary: Software Heritage graph service
Home-page: https://forge.softwareheritage.org/diffusion/DGRPH
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-graph
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-graph/
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 3 - Alpha
Requires-Python: >=3.7
Description-Content-Type: text/x-rst
Provides-Extra: testing
License-File: LICENSE
License-File: AUTHORS
Software Heritage - graph service
=================================
Tooling and services, collectively known as ``swh-graph``, providing fast
access to the graph representation of the `Software Heritage
`_
`archive `_. The service is in-memory,
based on a compressed representation of the Software Heritage Merkle DAG.
Bibliography
------------
In addition to accompanying technical documentation, ``swh-graph`` is also
described in the following scientific paper. If you publish results based on
``swh-graph``, please acknowledge it by citing the paper as follows:
.. note::
Paolo Boldi, Antoine Pietri, Sebastiano Vigna, Stefano Zacchiroli.
`Ultra-Large-Scale Repository Analysis via Graph Compression
`_. In proceedings of `SANER
2020 `_: The 27th IEEE International
Conference on Software Analysis, Evolution and Reengineering, pages
184-194. IEEE 2020.
Links: `preprint
`_,
`bibtex
`_.
diff --git a/swh.graph.egg-info/SOURCES.txt b/swh.graph.egg-info/SOURCES.txt
index f96c89d..22bd435 100644
--- a/swh.graph.egg-info/SOURCES.txt
+++ b/swh.graph.egg-info/SOURCES.txt
@@ -1,259 +1,259 @@
.git-blame-ignore-revs
.gitignore
.pre-commit-config.yaml
AUTHORS
CODE_OF_CONDUCT.md
CONTRIBUTORS
LICENSE
MANIFEST.in
Makefile
Makefile.local
README.rst
conftest.py
mypy.ini
pyproject.toml
pytest.ini
requirements-swh.txt
requirements-test.txt
requirements.txt
setup.cfg
setup.py
tox.ini
docker/Dockerfile
docker/build.sh
docker/run.sh
docs/.gitignore
docs/Makefile
docs/Makefile.local
docs/README.rst
docs/api.rst
docs/cli.rst
docs/compression.rst
docs/conf.py
docs/docker.rst
docs/git2graph.md
docs/grpc-api.rst
docs/index.rst
docs/java-api.rst
docs/memory.rst
docs/quickstart.rst
docs/_static/.placeholder
docs/_templates/.placeholder
docs/images/.gitignore
docs/images/Makefile
docs/images/compression_steps.dot
java/.coding-style.xml
java/.gitignore
java/AUTHORS
java/LICENSE
java/README.md
java/pom.xml
java/.mvn/jvm.config
java/src/main/proto
java/src/main/java/org/softwareheritage/graph/AllowedEdges.java
java/src/main/java/org/softwareheritage/graph/AllowedNodes.java
java/src/main/java/org/softwareheritage/graph/SWHID.java
java/src/main/java/org/softwareheritage/graph/Subgraph.java
java/src/main/java/org/softwareheritage/graph/SwhBidirectionalGraph.java
java/src/main/java/org/softwareheritage/graph/SwhGraph.java
java/src/main/java/org/softwareheritage/graph/SwhGraphProperties.java
java/src/main/java/org/softwareheritage/graph/SwhType.java
java/src/main/java/org/softwareheritage/graph/SwhUnidirectionalGraph.java
java/src/main/java/org/softwareheritage/graph/compress/CSVEdgeDataset.java
java/src/main/java/org/softwareheritage/graph/compress/ComposePermutations.java
java/src/main/java/org/softwareheritage/graph/compress/ExtractNodes.java
java/src/main/java/org/softwareheritage/graph/compress/ExtractPersons.java
java/src/main/java/org/softwareheritage/graph/compress/GraphDataset.java
java/src/main/java/org/softwareheritage/graph/compress/LabelMapBuilder.java
java/src/main/java/org/softwareheritage/graph/compress/NodeMapBuilder.java
java/src/main/java/org/softwareheritage/graph/compress/ORCGraphDataset.java
java/src/main/java/org/softwareheritage/graph/compress/ScatteredArcsORCGraph.java
java/src/main/java/org/softwareheritage/graph/compress/WriteNodeProperties.java
java/src/main/java/org/softwareheritage/graph/experiments/forks/ForkCC.java
java/src/main/java/org/softwareheritage/graph/experiments/forks/ForkCliques.java
java/src/main/java/org/softwareheritage/graph/experiments/forks/ListEmptyOrigins.java
java/src/main/java/org/softwareheritage/graph/experiments/topology/AveragePaths.java
java/src/main/java/org/softwareheritage/graph/experiments/topology/ClusteringCoefficient.java
java/src/main/java/org/softwareheritage/graph/experiments/topology/ConnectedComponents.java
java/src/main/java/org/softwareheritage/graph/experiments/topology/InOutDegree.java
java/src/main/java/org/softwareheritage/graph/experiments/topology/SubdatasetSizeFunction.java
java/src/main/java/org/softwareheritage/graph/labels/DirEntry.java
java/src/main/java/org/softwareheritage/graph/labels/SwhLabel.java
java/src/main/java/org/softwareheritage/graph/maps/NodeIdMap.java
java/src/main/java/org/softwareheritage/graph/maps/NodeTypesMap.java
java/src/main/java/org/softwareheritage/graph/rpc/GraphServer.java
java/src/main/java/org/softwareheritage/graph/rpc/NodePropertyBuilder.java
java/src/main/java/org/softwareheritage/graph/rpc/Traversal.java
java/src/main/java/org/softwareheritage/graph/utils/DumpProperties.java
java/src/main/java/org/softwareheritage/graph/utils/ExportSubdataset.java
java/src/main/java/org/softwareheritage/graph/utils/FindEarliestRevision.java
java/src/main/java/org/softwareheritage/graph/utils/ForkJoinBigQuickSort2.java
java/src/main/java/org/softwareheritage/graph/utils/ForkJoinQuickSort3.java
java/src/main/java/org/softwareheritage/graph/utils/MPHTranslate.java
java/src/main/java/org/softwareheritage/graph/utils/ReadGraph.java
java/src/main/java/org/softwareheritage/graph/utils/ReadLabelledGraph.java
java/src/main/java/org/softwareheritage/graph/utils/Sort.java
java/src/test/java/org/softwareheritage/graph/AllowedEdgesTest.java
java/src/test/java/org/softwareheritage/graph/AllowedNodesTest.java
java/src/test/java/org/softwareheritage/graph/GraphTest.java
java/src/test/java/org/softwareheritage/graph/SubgraphTest.java
java/src/test/java/org/softwareheritage/graph/compress/ExtractNodesTest.java
java/src/test/java/org/softwareheritage/graph/compress/ExtractPersonsTest.java
java/src/test/java/org/softwareheritage/graph/rpc/CountEdgesTest.java
java/src/test/java/org/softwareheritage/graph/rpc/CountNodesTest.java
java/src/test/java/org/softwareheritage/graph/rpc/FindPathBetweenTest.java
java/src/test/java/org/softwareheritage/graph/rpc/FindPathToTest.java
java/src/test/java/org/softwareheritage/graph/rpc/GetNodeTest.java
java/src/test/java/org/softwareheritage/graph/rpc/StatsTest.java
java/src/test/java/org/softwareheritage/graph/rpc/TraversalServiceTest.java
java/src/test/java/org/softwareheritage/graph/rpc/TraverseLeavesTest.java
java/src/test/java/org/softwareheritage/graph/rpc/TraverseNeighborsTest.java
java/src/test/java/org/softwareheritage/graph/rpc/TraverseNodesPropertiesTest.java
java/src/test/java/org/softwareheritage/graph/rpc/TraverseNodesTest.java
java/src/test/java/org/softwareheritage/graph/utils/ForkJoinBigQuickSort2Test.java
java/src/test/java/org/softwareheritage/graph/utils/ForkJoinQuickSort3Test.java
-java/target/swh-graph-2.1.0.jar
+java/target/swh-graph-2.1.2.jar
proto/swhgraph.proto
reports/.gitignore
reports/benchmarks/Makefile
reports/benchmarks/benchmarks.tex
reports/experiments/Makefile
reports/experiments/experiments.tex
reports/linux_log/LinuxLog.java
reports/linux_log/Makefile
reports/linux_log/linux_log.tex
reports/node_mapping/Makefile
reports/node_mapping/NodeIdMapHaloDB.java
reports/node_mapping/NodeIdMapRocksDB.java
reports/node_mapping/node_mapping.tex
swh/__init__.py
swh.graph.egg-info/PKG-INFO
swh.graph.egg-info/SOURCES.txt
swh.graph.egg-info/dependency_links.txt
swh.graph.egg-info/entry_points.txt
swh.graph.egg-info/requires.txt
swh.graph.egg-info/top_level.txt
swh/graph/__init__.py
swh/graph/cli.py
swh/graph/client.py
swh/graph/config.py
swh/graph/grpc_server.py
swh/graph/http_client.py
swh/graph/http_naive_client.py
swh/graph/http_rpc_server.py
swh/graph/naive_client.py
swh/graph/py.typed
swh/graph/pytest_plugin.py
swh/graph/webgraph.py
swh/graph/grpc/swhgraph.proto
swh/graph/grpc/swhgraph_pb2.py
swh/graph/grpc/swhgraph_pb2.pyi
swh/graph/grpc/swhgraph_pb2_grpc.py
swh/graph/tests/__init__.py
swh/graph/tests/test_cli.py
swh/graph/tests/test_grpc.py
swh/graph/tests/test_http_client.py
swh/graph/tests/test_http_server_down.py
swh/graph/tests/dataset/generate_dataset.py
swh/graph/tests/dataset/compressed/example-labelled.labeloffsets
swh/graph/tests/dataset/compressed/example-labelled.labels
swh/graph/tests/dataset/compressed/example-labelled.properties
swh/graph/tests/dataset/compressed/example-transposed-labelled.labeloffsets
swh/graph/tests/dataset/compressed/example-transposed-labelled.labels
swh/graph/tests/dataset/compressed/example-transposed-labelled.properties
swh/graph/tests/dataset/compressed/example-transposed.graph
swh/graph/tests/dataset/compressed/example-transposed.obl
swh/graph/tests/dataset/compressed/example-transposed.offsets
swh/graph/tests/dataset/compressed/example-transposed.properties
swh/graph/tests/dataset/compressed/example.edges.count.txt
swh/graph/tests/dataset/compressed/example.edges.stats.txt
swh/graph/tests/dataset/compressed/example.graph
swh/graph/tests/dataset/compressed/example.indegree
swh/graph/tests/dataset/compressed/example.labels.count.txt
swh/graph/tests/dataset/compressed/example.labels.csv.zst
swh/graph/tests/dataset/compressed/example.labels.fcl.bytearray
swh/graph/tests/dataset/compressed/example.labels.fcl.pointers
swh/graph/tests/dataset/compressed/example.labels.fcl.properties
swh/graph/tests/dataset/compressed/example.labels.mph
swh/graph/tests/dataset/compressed/example.mph
swh/graph/tests/dataset/compressed/example.node2swhid.bin
swh/graph/tests/dataset/compressed/example.node2type.map
swh/graph/tests/dataset/compressed/example.nodes.count.txt
swh/graph/tests/dataset/compressed/example.nodes.csv.zst
swh/graph/tests/dataset/compressed/example.nodes.stats.txt
swh/graph/tests/dataset/compressed/example.obl
swh/graph/tests/dataset/compressed/example.offsets
swh/graph/tests/dataset/compressed/example.order
swh/graph/tests/dataset/compressed/example.outdegree
swh/graph/tests/dataset/compressed/example.persons.count.txt
swh/graph/tests/dataset/compressed/example.persons.csv.zst
swh/graph/tests/dataset/compressed/example.persons.mph
swh/graph/tests/dataset/compressed/example.properties
swh/graph/tests/dataset/compressed/example.property.author_id.bin
swh/graph/tests/dataset/compressed/example.property.author_timestamp.bin
swh/graph/tests/dataset/compressed/example.property.author_timestamp_offset.bin
swh/graph/tests/dataset/compressed/example.property.committer_id.bin
swh/graph/tests/dataset/compressed/example.property.committer_timestamp.bin
swh/graph/tests/dataset/compressed/example.property.committer_timestamp_offset.bin
swh/graph/tests/dataset/compressed/example.property.content.is_skipped.bin
swh/graph/tests/dataset/compressed/example.property.content.length.bin
swh/graph/tests/dataset/compressed/example.property.message.bin
swh/graph/tests/dataset/compressed/example.property.message.offset.bin
swh/graph/tests/dataset/compressed/example.property.tag_name.bin
swh/graph/tests/dataset/compressed/example.property.tag_name.offset.bin
swh/graph/tests/dataset/compressed/example.stats
swh/graph/tests/dataset/edges/content/graph-all.edges.csv.zst
swh/graph/tests/dataset/edges/content/graph-all.nodes.csv.zst
swh/graph/tests/dataset/edges/directory/graph-all.edges.csv.zst
swh/graph/tests/dataset/edges/directory/graph-all.nodes.csv.zst
swh/graph/tests/dataset/edges/origin/graph-all.edges.csv.zst
swh/graph/tests/dataset/edges/origin/graph-all.nodes.csv.zst
swh/graph/tests/dataset/edges/release/graph-all.edges.csv.zst
swh/graph/tests/dataset/edges/release/graph-all.nodes.csv.zst
swh/graph/tests/dataset/edges/revision/graph-all.edges.csv.zst
swh/graph/tests/dataset/edges/revision/graph-all.nodes.csv.zst
swh/graph/tests/dataset/edges/snapshot/graph-all.edges.csv.zst
swh/graph/tests/dataset/edges/snapshot/graph-all.nodes.csv.zst
swh/graph/tests/dataset/img/.gitignore
swh/graph/tests/dataset/img/Makefile
swh/graph/tests/dataset/img/example.dot
swh/graph/tests/dataset/orc/content/content-all.orc
swh/graph/tests/dataset/orc/directory/directory-all.orc
swh/graph/tests/dataset/orc/directory_entry/directory_entry-all.orc
swh/graph/tests/dataset/orc/origin/origin-all.orc
swh/graph/tests/dataset/orc/origin_visit/origin_visit-all.orc
swh/graph/tests/dataset/orc/origin_visit_status/origin_visit_status-all.orc
swh/graph/tests/dataset/orc/release/release-all.orc
swh/graph/tests/dataset/orc/revision/revision-all.orc
swh/graph/tests/dataset/orc/revision_extra_headers/revision_extra_headers-all.orc
swh/graph/tests/dataset/orc/revision_history/revision_history-all.orc
swh/graph/tests/dataset/orc/skipped_content/skipped_content-all.orc
swh/graph/tests/dataset/orc/snapshot/snapshot-all.orc
swh/graph/tests/dataset/orc/snapshot_branch/snapshot_branch-all.orc
tools/dir2graph
tools/swhid2int2int2swhid.sh
tools/git2graph/.gitignore
tools/git2graph/Makefile
tools/git2graph/README.md
tools/git2graph/git2graph.c
tools/git2graph/tests/edge-filters.bats
tools/git2graph/tests/full-graph.bats
tools/git2graph/tests/node-filters.bats
tools/git2graph/tests/repo_helper.bash
tools/git2graph/tests/data/sample-repo.tgz
tools/git2graph/tests/data/graphs/dir-nodes/edges.csv
tools/git2graph/tests/data/graphs/dir-nodes/nodes.csv
tools/git2graph/tests/data/graphs/from-dir-edges/edges.csv
tools/git2graph/tests/data/graphs/from-dir-edges/nodes.csv
tools/git2graph/tests/data/graphs/from-rel-edges/edges.csv
tools/git2graph/tests/data/graphs/from-rel-edges/nodes.csv
tools/git2graph/tests/data/graphs/fs-nodes/edges.csv
tools/git2graph/tests/data/graphs/fs-nodes/nodes.csv
tools/git2graph/tests/data/graphs/full/edges.csv
tools/git2graph/tests/data/graphs/full/nodes.csv
tools/git2graph/tests/data/graphs/rev-edges/edges.csv
tools/git2graph/tests/data/graphs/rev-edges/nodes.csv
tools/git2graph/tests/data/graphs/rev-nodes/edges.csv
tools/git2graph/tests/data/graphs/rev-nodes/nodes.csv
tools/git2graph/tests/data/graphs/to-rev-edges/edges.csv
tools/git2graph/tests/data/graphs/to-rev-edges/nodes.csv
\ No newline at end of file
diff --git a/swh/graph/cli.py b/swh/graph/cli.py
index 4882230..8afd692 100644
--- a/swh/graph/cli.py
+++ b/swh/graph/cli.py
@@ -1,247 +1,256 @@
-# Copyright (C) 2019-2020 The Software Heritage developers
+# Copyright (C) 2019-2022 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
+import logging
from pathlib import Path
+import shlex
from typing import TYPE_CHECKING, Any, Dict, Set, Tuple
# WARNING: do not import unnecessary things here to keep cli startup time under
# control
import click
from swh.core.cli import CONTEXT_SETTINGS, AliasedGroup
from swh.core.cli import swh as swh_cli_group
if TYPE_CHECKING:
from swh.graph.webgraph import CompressionStep # noqa
+logger = logging.getLogger(__name__)
+
class StepOption(click.ParamType):
"""click type for specifying a compression step on the CLI
parse either individual steps, specified as step names or integers, or step
ranges
"""
name = "compression step"
def convert(self, value, param, ctx): # type: (...) -> Set[CompressionStep]
from swh.graph.webgraph import COMP_SEQ, CompressionStep # noqa
steps: Set[CompressionStep] = set()
specs = value.split(",")
for spec in specs:
if "-" in spec: # step range
(raw_l, raw_r) = spec.split("-", maxsplit=1)
if raw_l == "": # no left endpoint
raw_l = COMP_SEQ[0].name
if raw_r == "": # no right endpoint
raw_r = COMP_SEQ[-1].name
l_step = self.convert(raw_l, param, ctx)
r_step = self.convert(raw_r, param, ctx)
if len(l_step) != 1 or len(r_step) != 1:
self.fail(f"invalid step specification: {value}, " f"see --help")
l_idx = l_step.pop()
r_idx = r_step.pop()
steps = steps.union(
set(CompressionStep(i) for i in range(l_idx.value, r_idx.value + 1))
)
else: # singleton step
try:
steps.add(CompressionStep(int(spec))) # integer step
except ValueError:
try:
steps.add(CompressionStep[spec.upper()]) # step name
except KeyError:
self.fail(
f"invalid step specification: {value}, " f"see --help"
)
return steps
class PathlibPath(click.Path):
"""A Click path argument that returns a pathlib Path, not a string"""
def convert(self, value, param, ctx):
return Path(super().convert(value, param, ctx))
DEFAULT_CONFIG: Dict[str, Tuple[str, Any]] = {"graph": ("dict", {})}
@swh_cli_group.group(name="graph", context_settings=CONTEXT_SETTINGS, cls=AliasedGroup)
@click.option(
"--config-file",
"-C",
default=None,
type=click.Path(
exists=True,
dir_okay=False,
),
help="YAML configuration file",
)
@click.pass_context
def graph_cli_group(ctx, config_file):
"""Software Heritage graph tools."""
from swh.core import config
ctx.ensure_object(dict)
conf = config.read(config_file, DEFAULT_CONFIG)
if "graph" not in conf:
raise ValueError(
'no "graph" stanza found in configuration file %s' % config_file
)
ctx.obj["config"] = conf
@graph_cli_group.command(name="rpc-serve")
@click.option(
"--host",
"-h",
default="0.0.0.0",
metavar="IP",
show_default=True,
help="host IP address to bind the server on",
)
@click.option(
"--port",
"-p",
default=5009,
type=click.INT,
metavar="PORT",
show_default=True,
help="port to bind the server on",
)
@click.option(
"--graph", "-g", required=True, metavar="GRAPH", help="compressed graph basename"
)
@click.pass_context
def serve(ctx, host, port, graph):
"""run the graph RPC service"""
import aiohttp.web
from swh.graph.http_rpc_server import make_app
config = ctx.obj["config"]
config.setdefault("graph", {})
config["graph"]["path"] = graph
app = make_app(config=config)
aiohttp.web.run_app(app, host=host, port=port)
@graph_cli_group.command(name="grpc-serve")
@click.option(
"--port",
"-p",
default=50091,
type=click.INT,
metavar="PORT",
show_default=True,
help=(
"port to bind the server on (note: host is not configurable "
"for now and will be 0.0.0.0)"
),
)
@click.option(
"--java-home",
"-j",
default=None,
metavar="JAVA_HOME",
help="absolute path to the Java Runtime Environment (JRE)",
)
@click.option(
"--graph", "-g", required=True, metavar="GRAPH", help="compressed graph basename"
)
@click.pass_context
def grpc_serve(ctx, port, java_home, graph):
"""start the graph GRPC service
This command uses execve to execute the java GRPC service.
"""
import os
from pathlib import Path
from swh.graph.grpc_server import build_grpc_server_cmdline
config = ctx.obj["config"]
config.setdefault("graph", {})
config["graph"]["path"] = graph
+ config["graph"]["port"] = port
+
+ logger.debug("Building gPRC server command line")
cmd, port = build_grpc_server_cmdline(**config["graph"])
java_bin = cmd[0]
if java_home is not None:
java_bin = str(Path(java_home) / "bin" / java_bin)
- print(f"Starting the GRPC server on 0.0.0.0:{port}")
+ # XXX: shlex.join() is in 3.8
+ # logger.info("Starting gRPC server: %s", shlex.join(cmd))
+ logger.info("Starting gRPC server: %s", " ".join(shlex.quote(x) for x in cmd))
os.execvp(java_bin, cmd)
@graph_cli_group.command()
@click.option(
"--input-dataset",
"-i",
required=True,
type=PathlibPath(),
help="graph dataset directory, in ORC format",
)
@click.option(
"--output-directory",
"-o",
required=True,
type=PathlibPath(),
help="directory where to store compressed graph",
)
@click.option(
"--graph-name",
"-g",
default="graph",
metavar="NAME",
help="name of the output graph (default: 'graph')",
)
@click.option(
"--steps",
"-s",
metavar="STEPS",
type=StepOption(),
help="run only these compression steps (default: all steps)",
)
@click.pass_context
def compress(ctx, input_dataset, output_directory, graph_name, steps):
"""Compress a graph using WebGraph
Input: a directory containing a graph dataset in ORC format
Output: a directory containing a WebGraph compressed graph
Compression steps are: (1) extract_nodes, (2) mph, (3) bv, (4) bfs, (5)
permute_bfs, (6) transpose_bfs, (7) simplify, (8) llp, (9) permute_llp,
(10) obl, (11) compose_orders, (12) stats, (13) transpose, (14)
transpose_obl, (15) maps, (16) extract_persons, (17) mph_persons, (18)
node_properties, (19) mph_labels, (20) fcl_labels, (21) edge_labels, (22)
edge_labels_obl, (23) edge_labels_transpose_obl, (24) clean_tmp.
Compression steps can be selected by name or number using --steps,
separating them with commas; step ranges (e.g., 3-9, 6-, etc.) are also
supported.
"""
from swh.graph import webgraph
try:
conf = ctx.obj["config"]["graph"]["compress"]
except KeyError:
conf = {} # use defaults
webgraph.compress(graph_name, input_dataset, output_directory, steps, conf)
def main():
return graph_cli_group(auto_envvar_prefix="SWH_GRAPH")
if __name__ == "__main__":
main()
diff --git a/swh/graph/config.py b/swh/graph/config.py
index 12ac12c..92d4df2 100644
--- a/swh/graph/config.py
+++ b/swh/graph/config.py
@@ -1,117 +1,128 @@
-# Copyright (C) 2019 The Software Heritage developers
+# Copyright (C) 2019-2022 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import logging
from pathlib import Path
import sys
import psutil
+logger = logging.getLogger(__name__)
+
def find_graph_jar():
"""find swh-graph.jar, containing the Java part of swh-graph
look both in development directories and installed data (for in-production
deployments who fecthed the JAR from pypi)
"""
+ logger.debug("Looking for swh-graph JAR")
swh_graph_root = Path(__file__).parents[2]
try_paths = [
swh_graph_root / "java/target/",
Path(sys.prefix) / "share/swh-graph/",
Path(sys.prefix) / "local/share/swh-graph/",
]
for path in try_paths:
+ logger.debug("Looking for swh-graph JAR in %s", path)
glob = list(path.glob("swh-graph-*.jar"))
if glob:
if len(glob) > 1:
- logging.warning(
+ logger.warning(
"found multiple swh-graph JARs, " "arbitrarily picking one"
)
- logging.info("using swh-graph JAR: {0}".format(glob[0]))
+ logger.info("using swh-graph JAR: {0}".format(glob[0]))
return str(glob[0])
raise RuntimeError("swh-graph JAR not found. Have you run `make java`?")
def check_config(conf):
"""check configuration and propagate defaults"""
conf = conf.copy()
if "batch_size" not in conf:
# Use 0.1% of the RAM as a batch size:
# ~1 billion for big servers, ~10 million for small desktop machines
conf["batch_size"] = min(int(psutil.virtual_memory().total / 1000), 2**30 - 1)
+ logger.debug("batch_size not configured, defaulting to %s", conf["batch_size"])
if "llp_gammas" not in conf:
conf["llp_gammas"] = "-0,-1,-2,-3,-4"
+ logger.debug("llp_gammas not configured, defaulting to %s", conf["llp_gammas"])
if "max_ram" not in conf:
conf["max_ram"] = str(int(psutil.virtual_memory().total * 0.9))
+ logger.debug("max_ram not configured, defaulting to %s", conf["max_ram"])
if "java_tool_options" not in conf:
conf["java_tool_options"] = " ".join(
[
"-Xmx{max_ram}",
"-XX:PretenureSizeThreshold=512M",
"-XX:MaxNewSize=4G",
"-XX:+UseLargePages",
"-XX:+UseTransparentHugePages",
"-XX:+UseNUMA",
"-XX:+UseTLAB",
"-XX:+ResizeTLAB",
]
)
+ logger.debug(
+ "java_tool_options not providing, defaulting to %s",
+ conf["java_tool_options"],
+ )
conf["java_tool_options"] = conf["java_tool_options"].format(
max_ram=conf["max_ram"]
)
if "java" not in conf:
conf["java"] = "java"
if "classpath" not in conf:
conf["classpath"] = find_graph_jar()
return conf
def check_config_compress(config, graph_name, in_dir, out_dir):
"""check compression-specific configuration and initialize its execution
environment.
"""
conf = check_config(config)
conf["graph_name"] = graph_name
conf["in_dir"] = str(in_dir)
conf["out_dir"] = str(out_dir)
out_dir.mkdir(parents=True, exist_ok=True)
if "tmp_dir" not in conf:
tmp_dir = out_dir / "tmp"
conf["tmp_dir"] = str(tmp_dir)
else:
tmp_dir = Path(conf["tmp_dir"])
tmp_dir.mkdir(parents=True, exist_ok=True)
if "logback" not in conf:
logback_confpath = tmp_dir / "logback.xml"
with open(logback_confpath, "w") as conffile:
conffile.write(
"""
%d %r %p [%t] %logger{1} - %m%n
System.err
"""
)
conf["logback"] = str(logback_confpath)
conf["java_tool_options"] += " -Dlogback.configurationFile={logback}"
conf["java_tool_options"] += " -Djava.io.tmpdir={tmp_dir}"
conf["java_tool_options"] = conf["java_tool_options"].format(
logback=conf["logback"],
tmp_dir=conf["tmp_dir"],
)
return conf
diff --git a/swh/graph/grpc_server.py b/swh/graph/grpc_server.py
index 5c129a5..a71e481 100644
--- a/swh/graph/grpc_server.py
+++ b/swh/graph/grpc_server.py
@@ -1,54 +1,59 @@
# Copyright (C) 2021-2022 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
"""
A simple tool to start the swh-graph GRPC server in Java.
"""
import logging
import shlex
import subprocess
import aiohttp.test_utils
import aiohttp.web
from swh.graph.config import check_config
+logger = logging.getLogger(__name__)
+
def build_grpc_server_cmdline(**config):
port = config.pop("port", None)
if port is None:
port = aiohttp.test_utils.unused_port()
+ logger.debug("Port not configured, using random port %s", port)
+ logger.debug("Checking configuration and populating default values")
config = check_config(config)
+ logger.debug("Configuration: %r", config)
cmd = [
"java",
"--class-path",
config["classpath"],
*config["java_tool_options"].split(),
"org.softwareheritage.graph.rpc.GraphServer",
"--port",
str(port),
str(config["path"]),
]
return cmd, port
def spawn_java_grpc_server(**config):
cmd, port = build_grpc_server_cmdline(**config)
print(cmd)
# XXX: shlex.join() is in 3.8
- # logging.info("Starting RPC server: %s", shlex.join(cmd))
- logging.info("Starting GRPC server: %s", " ".join(shlex.quote(x) for x in cmd))
+ # logger.info("Starting gRPC server: %s", shlex.join(cmd))
+ logger.info("Starting gRPC server: %s", " ".join(shlex.quote(x) for x in cmd))
server = subprocess.Popen(cmd)
return server, port
def stop_java_grpc_server(server: subprocess.Popen, timeout: int = 15):
server.terminate()
try:
server.wait(timeout=timeout)
except subprocess.TimeoutExpired:
- logging.warning("Server did not terminate, sending kill signal...")
+ logger.warning("Server did not terminate, sending kill signal...")
server.kill()
diff --git a/swh/graph/http_client.py b/swh/graph/http_client.py
index b204d73..d867e97 100644
--- a/swh/graph/http_client.py
+++ b/swh/graph/http_client.py
@@ -1,167 +1,180 @@
# Copyright (C) 2019-2020 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import json
from swh.core.api import RPCClient
class GraphAPIError(Exception):
"""Graph API Error"""
def __str__(self):
return """An unexpected error occurred
in the Graph backend: {}""".format(
self.args
)
class GraphArgumentException(Exception):
def __init__(self, *args, response=None):
super().__init__(*args)
self.response = response
class RemoteGraphClient(RPCClient):
"""Client to the Software Heritage Graph."""
def __init__(self, url, timeout=None):
super().__init__(api_exception=GraphAPIError, url=url, timeout=timeout)
def raw_verb_lines(self, verb, endpoint, **kwargs):
response = self.raw_verb(verb, endpoint, stream=True, **kwargs)
self.raise_for_status(response)
for line in response.iter_lines():
yield line.decode().lstrip("\n")
def get_lines(self, endpoint, **kwargs):
yield from self.raw_verb_lines("get", endpoint, **kwargs)
def raise_for_status(self, response) -> None:
if response.status_code // 100 == 4:
raise GraphArgumentException(
response.content.decode("ascii"), response=response
)
super().raise_for_status(response)
# Web API endpoints
def stats(self):
return self.get("stats")
def leaves(
self,
src,
edges="*",
direction="forward",
max_edges=0,
return_types="*",
max_matching_nodes=0,
):
return self.get_lines(
"leaves/{}".format(src),
params={
"edges": edges,
"direction": direction,
"max_edges": max_edges,
"return_types": return_types,
"max_matching_nodes": max_matching_nodes,
},
)
def neighbors(
self, src, edges="*", direction="forward", max_edges=0, return_types="*"
):
return self.get_lines(
"neighbors/{}".format(src),
params={
"edges": edges,
"direction": direction,
"max_edges": max_edges,
"return_types": return_types,
},
)
def visit_nodes(
- self, src, edges="*", direction="forward", max_edges=0, return_types="*"
+ self,
+ src,
+ edges="*",
+ direction="forward",
+ max_edges=0,
+ return_types="*",
+ max_matching_nodes=0,
):
return self.get_lines(
"visit/nodes/{}".format(src),
params={
"edges": edges,
"direction": direction,
"max_edges": max_edges,
"return_types": return_types,
+ "max_matching_nodes": max_matching_nodes,
},
)
def visit_edges(self, src, edges="*", direction="forward", max_edges=0):
for edge in self.get_lines(
"visit/edges/{}".format(src),
params={"edges": edges, "direction": direction, "max_edges": max_edges},
):
yield tuple(edge.split())
def visit_paths(self, src, edges="*", direction="forward", max_edges=0):
def decode_path_wrapper(it):
for e in it:
yield json.loads(e)
return decode_path_wrapper(
self.get_lines(
"visit/paths/{}".format(src),
params={"edges": edges, "direction": direction, "max_edges": max_edges},
)
)
def walk(
self, src, dst, edges="*", traversal="dfs", direction="forward", limit=None
):
endpoint = "walk/{}/{}"
return self.get_lines(
endpoint.format(src, dst),
params={
"edges": edges,
"traversal": traversal,
"direction": direction,
"limit": limit,
},
)
def random_walk(
self, src, dst, edges="*", direction="forward", limit=None, return_types="*"
):
endpoint = "randomwalk/{}/{}"
return self.get_lines(
endpoint.format(src, dst),
params={
"edges": edges,
"direction": direction,
"limit": limit,
"return_types": return_types,
},
)
def count_leaves(self, src, edges="*", direction="forward", max_matching_nodes=0):
return self.get(
"leaves/count/{}".format(src),
params={
"edges": edges,
"direction": direction,
"max_matching_nodes": max_matching_nodes,
},
)
def count_neighbors(self, src, edges="*", direction="forward"):
return self.get(
"neighbors/count/{}".format(src),
params={"edges": edges, "direction": direction},
)
- def count_visit_nodes(self, src, edges="*", direction="forward"):
+ def count_visit_nodes(
+ self, src, edges="*", direction="forward", max_matching_nodes=0
+ ):
return self.get(
"visit/nodes/count/{}".format(src),
- params={"edges": edges, "direction": direction},
+ params={
+ "edges": edges,
+ "direction": direction,
+ "max_matching_nodes": max_matching_nodes,
+ },
)
diff --git a/swh/graph/http_naive_client.py b/swh/graph/http_naive_client.py
index 43f0088..8cf8a99 100644
--- a/swh/graph/http_naive_client.py
+++ b/swh/graph/http_naive_client.py
@@ -1,412 +1,423 @@
# Copyright (C) 2021-2022 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import functools
import inspect
import itertools
import re
import statistics
from typing import (
Callable,
Dict,
Iterable,
Iterator,
List,
Optional,
Set,
Tuple,
TypeVar,
Union,
)
from swh.model.swhids import CoreSWHID, ExtendedSWHID, ValidationError
from .http_client import GraphArgumentException
_NODE_TYPES = "ori|snp|rel|rev|dir|cnt"
NODES_RE = re.compile(rf"(\*|{_NODE_TYPES})")
EDGES_RE = re.compile(rf"(\*|{_NODE_TYPES}):(\*|{_NODE_TYPES})")
T = TypeVar("T", bound=Callable)
SWHIDlike = Union[CoreSWHID, ExtendedSWHID, str]
def check_arguments(f: T) -> T:
"""Decorator for generic argument checking for methods of NaiveClient.
Checks ``src`` is a valid and known SWHID, and ``edges`` has the right format."""
signature = inspect.signature(f)
@functools.wraps(f)
def newf(*args, **kwargs):
__tracebackhide__ = True # for pytest
try:
bound_args = signature.bind(*args, **kwargs)
except TypeError as e:
# rethrow the exception from here so pytest doesn't flood the terminal
# with signature.bind's call stack.
raise TypeError(*e.args) from None
self = bound_args.arguments["self"]
src = bound_args.arguments.get("src")
if src:
self._check_swhid(src)
edges = bound_args.arguments.get("edges")
if edges:
if edges != "*" and not EDGES_RE.match(edges):
raise GraphArgumentException(f"invalid edge restriction: {edges}")
return_types = bound_args.arguments.get("return_types")
if return_types:
if not NODES_RE.match(return_types):
raise GraphArgumentException(
f"invalid return_types restriction: {return_types}"
)
return f(*args, **kwargs)
return newf # type: ignore
def filter_node_types(node_types: str, nodes: Iterable[str]) -> Iterator[str]:
if node_types == "*":
yield from nodes
else:
prefixes = tuple(f"swh:1:{type_}:" for type_ in node_types.split(","))
for node in nodes:
if node.startswith(prefixes):
yield node
class NaiveClient:
"""An alternative implementation of the graph server, written in
pure-python and meant for simulating it in other components' test cases;
constructed from a list of nodes and (directed) edges, both represented as
SWHIDs.
It is NOT meant to be efficient in any way; only to be a very simple
implementation that provides the same behavior.
>>> nodes = [
... "swh:1:rev:1111111111111111111111111111111111111111",
... "swh:1:rev:2222222222222222222222222222222222222222",
... "swh:1:rev:3333333333333333333333333333333333333333",
... ]
>>> edges = [
... (
... "swh:1:rev:1111111111111111111111111111111111111111",
... "swh:1:rev:2222222222222222222222222222222222222222",
... ),
... (
... "swh:1:rev:2222222222222222222222222222222222222222",
... "swh:1:rev:3333333333333333333333333333333333333333",
... ),
... ]
>>> c = NaiveClient(nodes=nodes, edges=edges)
>>> list(c.leaves("swh:1:rev:1111111111111111111111111111111111111111"))
['swh:1:rev:3333333333333333333333333333333333333333']
"""
def __init__(
self, *, nodes: List[SWHIDlike], edges: List[Tuple[SWHIDlike, SWHIDlike]]
):
self.graph = Graph(nodes, edges)
def _check_swhid(self, swhid):
try:
ExtendedSWHID.from_string(swhid)
except ValidationError as e:
raise GraphArgumentException(*e.args) from None
if swhid not in self.graph.nodes:
raise GraphArgumentException(f"SWHID not found: {swhid}")
def stats(self) -> Dict:
return {
"num_nodes": len(self.graph.nodes),
"num_edges": sum(map(len, self.graph.forward_edges.values())),
"compression_ratio": 1.0,
"bits_per_edge": 100.0,
"bits_per_node": 100.0,
"avg_locality": 0.0,
"indegree_min": min(map(len, self.graph.backward_edges.values())),
"indegree_max": max(map(len, self.graph.backward_edges.values())),
"indegree_avg": statistics.mean(
map(len, self.graph.backward_edges.values())
),
"outdegree_min": min(map(len, self.graph.forward_edges.values())),
"outdegree_max": max(map(len, self.graph.forward_edges.values())),
"outdegree_avg": statistics.mean(
map(len, self.graph.forward_edges.values())
),
}
@check_arguments
def leaves(
self,
src: str,
edges: str = "*",
direction: str = "forward",
max_edges: int = 0,
return_types: str = "*",
max_matching_nodes: int = 0,
) -> Iterator[str]:
# TODO: max_edges
leaves = filter_node_types(
return_types,
[
node
for node in self.graph.get_subgraph(src, edges, direction)
if not self.graph.get_filtered_neighbors(node, edges, direction)
],
)
if max_matching_nodes > 0:
leaves = itertools.islice(leaves, max_matching_nodes)
return leaves
@check_arguments
def neighbors(
self,
src: str,
edges: str = "*",
direction: str = "forward",
max_edges: int = 0,
return_types: str = "*",
) -> Iterator[str]:
# TODO: max_edges
yield from filter_node_types(
return_types, self.graph.get_filtered_neighbors(src, edges, direction)
)
@check_arguments
def visit_nodes(
self,
src: str,
edges: str = "*",
direction: str = "forward",
max_edges: int = 0,
return_types: str = "*",
+ max_matching_nodes: int = 0,
) -> Iterator[str]:
# TODO: max_edges
- yield from filter_node_types(
+ res = filter_node_types(
return_types, self.graph.get_subgraph(src, edges, direction)
)
+ if max_matching_nodes > 0:
+ res = itertools.islice(res, max_matching_nodes)
+ return res
@check_arguments
def visit_edges(
self, src: str, edges: str = "*", direction: str = "forward", max_edges: int = 0
) -> Iterator[Tuple[str, str]]:
if max_edges == 0:
max_edges = None # type: ignore
else:
max_edges -= 1
yield from list(self.graph.iter_edges_dfs(direction, edges, src))[:max_edges]
@check_arguments
def visit_paths(
self, src: str, edges: str = "*", direction: str = "forward", max_edges: int = 0
) -> Iterator[List[str]]:
# TODO: max_edges
for path in self.graph.iter_paths_dfs(direction, edges, src):
if path[-1] in self.leaves(src, edges, direction):
yield list(path)
@check_arguments
def walk(
self,
src: str,
dst: str,
edges: str = "*",
traversal: str = "dfs",
direction: str = "forward",
limit: Optional[int] = None,
) -> Iterator[str]:
# TODO: implement algo="bfs"
# TODO: limit
match_path: Callable[[str], bool]
if ":" in dst:
match_path = dst.__eq__
self._check_swhid(dst)
else:
match_path = lambda node: node.startswith(f"swh:1:{dst}:") # noqa
for path in self.graph.iter_paths_dfs(direction, edges, src):
if match_path(path[-1]):
if not limit:
# 0 or None
yield from path
elif limit > 0:
yield from path[0:limit]
else:
yield from path[limit:]
@check_arguments
def random_walk(
self,
src: str,
dst: str,
edges: str = "*",
direction: str = "forward",
limit: Optional[int] = None,
):
# TODO: limit
yield from self.walk(src, dst, edges, "dfs", direction, limit)
@check_arguments
def count_leaves(
self,
src: str,
edges: str = "*",
direction: str = "forward",
max_matching_nodes: int = 0,
) -> int:
return len(
list(
self.leaves(
src, edges, direction, max_matching_nodes=max_matching_nodes
)
)
)
@check_arguments
def count_neighbors(
self, src: str, edges: str = "*", direction: str = "forward"
) -> int:
return len(self.graph.get_filtered_neighbors(src, edges, direction))
@check_arguments
def count_visit_nodes(
- self, src: str, edges: str = "*", direction: str = "forward"
+ self,
+ src: str,
+ edges: str = "*",
+ direction: str = "forward",
+ max_matching_nodes: int = 0,
) -> int:
- return len(self.graph.get_subgraph(src, edges, direction))
+ res = len(self.graph.get_subgraph(src, edges, direction))
+ if max_matching_nodes > 0:
+ res = min(max_matching_nodes, res)
+ return res
class Graph:
def __init__(
self, nodes: List[SWHIDlike], edges: List[Tuple[SWHIDlike, SWHIDlike]]
):
self.nodes = [str(node) for node in nodes]
self.forward_edges: Dict[str, List[str]] = {}
self.backward_edges: Dict[str, List[str]] = {}
for node in nodes:
self.forward_edges[str(node)] = []
self.backward_edges[str(node)] = []
for (src, dst) in edges:
self.forward_edges[str(src)].append(str(dst))
self.backward_edges[str(dst)].append(str(src))
def get_filtered_neighbors(
self,
src: str,
edges_fmt: str,
direction: str,
) -> Set[str]:
if direction == "forward":
edges = self.forward_edges
elif direction == "backward":
edges = self.backward_edges
else:
raise GraphArgumentException(f"invalid direction: {direction}")
neighbors = edges.get(src, [])
if edges_fmt == "*":
return set(neighbors)
else:
filtered_neighbors: Set[str] = set()
for edges_fmt_item in edges_fmt.split(","):
(src_fmt, dst_fmt) = edges_fmt_item.split(":")
if src_fmt != "*" and not src.startswith(f"swh:1:{src_fmt}:"):
continue
if dst_fmt == "*":
filtered_neighbors.update(neighbors)
else:
prefix = f"swh:1:{dst_fmt}:"
filtered_neighbors.update(
n for n in neighbors if n.startswith(prefix)
)
return filtered_neighbors
def get_subgraph(self, src: str, edges_fmt: str, direction: str) -> Set[str]:
seen = set()
to_visit = {src}
while to_visit:
node = to_visit.pop()
seen.add(node)
neighbors = set(self.get_filtered_neighbors(node, edges_fmt, direction))
new_nodes = neighbors - seen
to_visit.update(new_nodes)
return seen
def iter_paths_dfs(
self, direction: str, edges_fmt: str, src: str
) -> Iterator[Tuple[str, ...]]:
for (path, node) in DfsSubgraphIterator(self, direction, edges_fmt, src):
yield path + (node,)
def iter_edges_dfs(
self, direction: str, edges_fmt: str, src: str
) -> Iterator[Tuple[str, str]]:
for (path, node) in DfsSubgraphIterator(self, direction, edges_fmt, src):
if len(path) > 0:
yield (path[-1], node)
class SubgraphIterator(Iterator[Tuple[Tuple[str, ...], str]]):
def __init__(self, graph: Graph, direction: str, edges_fmt: str, src: str):
self.graph = graph
self.direction = direction
self.edges_fmt = edges_fmt
self.seen: Set[str] = set()
self.src = src
def more_work(self) -> bool:
raise NotImplementedError()
def pop(self) -> Tuple[Tuple[str, ...], str]:
raise NotImplementedError()
def push(self, new_path: Tuple[str, ...], neighbor: str) -> None:
raise NotImplementedError()
def __next__(self) -> Tuple[Tuple[str, ...], str]:
# Stores (path, next_node)
if not self.more_work():
raise StopIteration()
(path, node) = self.pop()
new_path = path + (node,)
if node not in self.seen:
neighbors = self.graph.get_filtered_neighbors(
node, self.edges_fmt, self.direction
)
# We want to visit the first neighbor first, and to_visit is a stack;
# so we need to reversed() the list of neighbors to get it on top
# of the stack.
for neighbor in reversed(list(neighbors)):
self.push(new_path, neighbor)
self.seen.add(node)
return (path, node)
class DfsSubgraphIterator(SubgraphIterator):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.to_visit: List[Tuple[Tuple[str, ...], str]] = [((), self.src)]
def more_work(self) -> bool:
return bool(self.to_visit)
def pop(self) -> Tuple[Tuple[str, ...], str]:
return self.to_visit.pop()
def push(self, new_path: Tuple[str, ...], neighbor: str) -> None:
self.to_visit.append((new_path, neighbor))
diff --git a/swh/graph/tests/test_http_client.py b/swh/graph/tests/test_http_client.py
index 1878029..1b8cb6e 100644
--- a/swh/graph/tests/test_http_client.py
+++ b/swh/graph/tests/test_http_client.py
@@ -1,408 +1,450 @@
# Copyright (c) 2022 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import hashlib
import pytest
from pytest import raises
from swh.core.api import RemoteException
from swh.graph.http_client import GraphArgumentException
TEST_ORIGIN_ID = "swh:1:ori:{}".format(
hashlib.sha1(b"https://example.com/swh/graph").hexdigest()
)
def test_stats(graph_client):
stats = graph_client.stats()
assert stats["num_nodes"] == 21
assert stats["num_edges"] == 23
assert isinstance(stats["compression_ratio"], float)
assert isinstance(stats["bits_per_node"], float)
assert isinstance(stats["bits_per_edge"], float)
assert isinstance(stats["avg_locality"], float)
assert stats["indegree_min"] == 0
assert stats["indegree_max"] == 3
assert isinstance(stats["indegree_avg"], float)
assert stats["outdegree_min"] == 0
assert stats["outdegree_max"] == 3
assert isinstance(stats["outdegree_avg"], float)
def test_leaves(graph_client):
actual = list(graph_client.leaves(TEST_ORIGIN_ID))
expected = [
"swh:1:cnt:0000000000000000000000000000000000000001",
"swh:1:cnt:0000000000000000000000000000000000000004",
"swh:1:cnt:0000000000000000000000000000000000000005",
"swh:1:cnt:0000000000000000000000000000000000000007",
]
assert set(actual) == set(expected)
@pytest.mark.parametrize("max_matching_nodes", [0, 1, 2, 3, 4, 5, 10, 1 << 31])
def test_leaves_with_limit(graph_client, max_matching_nodes):
actual = list(
graph_client.leaves(TEST_ORIGIN_ID, max_matching_nodes=max_matching_nodes)
)
expected = [
"swh:1:cnt:0000000000000000000000000000000000000001",
"swh:1:cnt:0000000000000000000000000000000000000004",
"swh:1:cnt:0000000000000000000000000000000000000005",
"swh:1:cnt:0000000000000000000000000000000000000007",
]
if max_matching_nodes == 0:
assert set(actual) == set(expected)
else:
assert set(actual) <= set(expected)
assert len(actual) == min(4, max_matching_nodes)
def test_neighbors(graph_client):
actual = list(
graph_client.neighbors(
"swh:1:rev:0000000000000000000000000000000000000009", direction="backward"
)
)
expected = [
"swh:1:snp:0000000000000000000000000000000000000020",
"swh:1:rel:0000000000000000000000000000000000000010",
"swh:1:rev:0000000000000000000000000000000000000013",
]
assert set(actual) == set(expected)
def test_visit_nodes(graph_client):
actual = list(
graph_client.visit_nodes(
"swh:1:rel:0000000000000000000000000000000000000010",
edges="rel:rev,rev:rev",
)
)
expected = [
"swh:1:rel:0000000000000000000000000000000000000010",
"swh:1:rev:0000000000000000000000000000000000000009",
"swh:1:rev:0000000000000000000000000000000000000003",
]
assert set(actual) == set(expected)
+@pytest.mark.parametrize("max_matching_nodes", [0, 1, 2, 3, 4, 5, 10, 1 << 31])
+def test_visit_nodes_limit(graph_client, max_matching_nodes):
+ actual = list(
+ graph_client.visit_nodes(
+ "swh:1:rel:0000000000000000000000000000000000000010",
+ edges="rel:rev,rev:rev",
+ max_matching_nodes=max_matching_nodes,
+ )
+ )
+ expected = [
+ "swh:1:rel:0000000000000000000000000000000000000010",
+ "swh:1:rev:0000000000000000000000000000000000000009",
+ "swh:1:rev:0000000000000000000000000000000000000003",
+ ]
+ if max_matching_nodes == 0:
+ assert set(actual) == set(expected)
+ else:
+ assert set(actual) <= set(expected)
+ assert len(actual) == min(3, max_matching_nodes)
+
+
def test_visit_nodes_filtered(graph_client):
actual = list(
graph_client.visit_nodes(
"swh:1:rel:0000000000000000000000000000000000000010",
return_types="dir",
)
)
expected = [
"swh:1:dir:0000000000000000000000000000000000000002",
"swh:1:dir:0000000000000000000000000000000000000008",
"swh:1:dir:0000000000000000000000000000000000000006",
]
assert set(actual) == set(expected)
+@pytest.mark.parametrize("max_matching_nodes", [0, 1, 2, 3, 4, 5, 10, 1 << 31])
+def test_visit_nodes_filtered_limit(graph_client, max_matching_nodes):
+ actual = list(
+ graph_client.visit_nodes(
+ "swh:1:rel:0000000000000000000000000000000000000010",
+ return_types="dir",
+ max_matching_nodes=max_matching_nodes,
+ )
+ )
+ expected = [
+ "swh:1:dir:0000000000000000000000000000000000000002",
+ "swh:1:dir:0000000000000000000000000000000000000008",
+ "swh:1:dir:0000000000000000000000000000000000000006",
+ ]
+ if max_matching_nodes == 0:
+ assert set(actual) == set(expected)
+ else:
+ assert set(actual) <= set(expected)
+ assert len(actual) == min(3, max_matching_nodes)
+
+
def test_visit_nodes_filtered_star(graph_client):
actual = list(
graph_client.visit_nodes(
"swh:1:rel:0000000000000000000000000000000000000010",
return_types="*",
)
)
expected = [
"swh:1:rel:0000000000000000000000000000000000000010",
"swh:1:rev:0000000000000000000000000000000000000009",
"swh:1:rev:0000000000000000000000000000000000000003",
"swh:1:dir:0000000000000000000000000000000000000002",
"swh:1:cnt:0000000000000000000000000000000000000001",
"swh:1:dir:0000000000000000000000000000000000000008",
"swh:1:cnt:0000000000000000000000000000000000000007",
"swh:1:dir:0000000000000000000000000000000000000006",
"swh:1:cnt:0000000000000000000000000000000000000004",
"swh:1:cnt:0000000000000000000000000000000000000005",
]
assert set(actual) == set(expected)
def test_visit_edges(graph_client):
actual = list(
graph_client.visit_edges(
"swh:1:rel:0000000000000000000000000000000000000010",
edges="rel:rev,rev:rev,rev:dir",
)
)
expected = [
(
"swh:1:rel:0000000000000000000000000000000000000010",
"swh:1:rev:0000000000000000000000000000000000000009",
),
(
"swh:1:rev:0000000000000000000000000000000000000009",
"swh:1:rev:0000000000000000000000000000000000000003",
),
(
"swh:1:rev:0000000000000000000000000000000000000009",
"swh:1:dir:0000000000000000000000000000000000000008",
),
(
"swh:1:rev:0000000000000000000000000000000000000003",
"swh:1:dir:0000000000000000000000000000000000000002",
),
]
assert set(actual) == set(expected)
def test_visit_edges_limited(graph_client):
actual = list(
graph_client.visit_edges(
"swh:1:rel:0000000000000000000000000000000000000010",
max_edges=4,
edges="rel:rev,rev:rev,rev:dir",
)
)
expected = [
(
"swh:1:rel:0000000000000000000000000000000000000010",
"swh:1:rev:0000000000000000000000000000000000000009",
),
(
"swh:1:rev:0000000000000000000000000000000000000009",
"swh:1:rev:0000000000000000000000000000000000000003",
),
(
"swh:1:rev:0000000000000000000000000000000000000009",
"swh:1:dir:0000000000000000000000000000000000000008",
),
(
"swh:1:rev:0000000000000000000000000000000000000003",
"swh:1:dir:0000000000000000000000000000000000000002",
),
]
# As there are four valid answers (up to reordering), we cannot check for
# equality. Instead, we check the client returned all edges but one.
assert set(actual).issubset(set(expected))
assert len(actual) == 3
def test_visit_edges_diamond_pattern(graph_client):
actual = list(
graph_client.visit_edges(
"swh:1:rev:0000000000000000000000000000000000000009",
edges="*",
)
)
expected = [
(
"swh:1:rev:0000000000000000000000000000000000000009",
"swh:1:rev:0000000000000000000000000000000000000003",
),
(
"swh:1:rev:0000000000000000000000000000000000000009",
"swh:1:dir:0000000000000000000000000000000000000008",
),
(
"swh:1:rev:0000000000000000000000000000000000000003",
"swh:1:dir:0000000000000000000000000000000000000002",
),
(
"swh:1:dir:0000000000000000000000000000000000000002",
"swh:1:cnt:0000000000000000000000000000000000000001",
),
(
"swh:1:dir:0000000000000000000000000000000000000008",
"swh:1:cnt:0000000000000000000000000000000000000001",
),
(
"swh:1:dir:0000000000000000000000000000000000000008",
"swh:1:cnt:0000000000000000000000000000000000000007",
),
(
"swh:1:dir:0000000000000000000000000000000000000008",
"swh:1:dir:0000000000000000000000000000000000000006",
),
(
"swh:1:dir:0000000000000000000000000000000000000006",
"swh:1:cnt:0000000000000000000000000000000000000004",
),
(
"swh:1:dir:0000000000000000000000000000000000000006",
"swh:1:cnt:0000000000000000000000000000000000000005",
),
]
assert set(actual) == set(expected)
@pytest.mark.skip(reason="currently disabled due to T1969")
def test_walk(graph_client):
args = ("swh:1:dir:0000000000000000000000000000000000000016", "rel")
kwargs = {
"edges": "dir:dir,dir:rev,rev:*",
"direction": "backward",
"traversal": "bfs",
}
actual = list(graph_client.walk(*args, **kwargs))
expected = [
"swh:1:dir:0000000000000000000000000000000000000016",
"swh:1:dir:0000000000000000000000000000000000000017",
"swh:1:rev:0000000000000000000000000000000000000018",
"swh:1:rel:0000000000000000000000000000000000000019",
]
assert set(actual) == set(expected)
kwargs2 = kwargs.copy()
kwargs2["limit"] = -1
actual = list(graph_client.walk(*args, **kwargs2))
expected = ["swh:1:rel:0000000000000000000000000000000000000019"]
assert set(actual) == set(expected)
kwargs2 = kwargs.copy()
kwargs2["limit"] = 2
actual = list(graph_client.walk(*args, **kwargs2))
expected = [
"swh:1:dir:0000000000000000000000000000000000000016",
"swh:1:dir:0000000000000000000000000000000000000017",
]
assert set(actual) == set(expected)
@pytest.mark.skip(reason="Random walk is deprecated")
def test_random_walk_dst_is_type(graph_client):
"""as the walk is random, we test a visit from a cnt node to a release
reachable from every single path in the backward graph, and only check the
final node of the path (i.e., the release)
"""
args = ("swh:1:cnt:0000000000000000000000000000000000000015", "rel")
kwargs = {"direction": "backward"}
expected_root = "swh:1:rel:0000000000000000000000000000000000000019"
actual = list(graph_client.random_walk(*args, **kwargs))
assert len(actual) > 1 # no release directly links to a content
assert actual[0] == args[0]
assert actual[-1] == expected_root
kwargs2 = kwargs.copy()
kwargs2["limit"] = -1
actual = list(graph_client.random_walk(*args, **kwargs2))
assert actual == [expected_root]
kwargs2["limit"] = -2
actual = list(graph_client.random_walk(*args, **kwargs2))
assert len(actual) == 2
assert actual[-1] == expected_root
kwargs2["limit"] = 3
actual = list(graph_client.random_walk(*args, **kwargs2))
assert len(actual) == 3
@pytest.mark.skip(reason="Random walk is deprecated")
def test_random_walk_dst_is_node(graph_client):
"""Same as test_random_walk_dst_is_type, but we target the specific release
node instead of a type
"""
args = (
"swh:1:cnt:0000000000000000000000000000000000000015",
"swh:1:rel:0000000000000000000000000000000000000019",
)
kwargs = {"direction": "backward"}
expected_root = "swh:1:rel:0000000000000000000000000000000000000019"
actual = list(graph_client.random_walk(*args, **kwargs))
assert len(actual) > 1 # no origin directly links to a content
assert actual[0] == args[0]
assert actual[-1] == expected_root
kwargs2 = kwargs.copy()
kwargs2["limit"] = -1
actual = list(graph_client.random_walk(*args, **kwargs2))
assert actual == [expected_root]
kwargs2["limit"] = -2
actual = list(graph_client.random_walk(*args, **kwargs2))
assert len(actual) == 2
assert actual[-1] == expected_root
kwargs2["limit"] = 3
actual = list(graph_client.random_walk(*args, **kwargs2))
assert len(actual) == 3
def test_count(graph_client):
actual = graph_client.count_leaves(TEST_ORIGIN_ID)
assert actual == 4
actual = graph_client.count_visit_nodes(
"swh:1:rel:0000000000000000000000000000000000000010", edges="rel:rev,rev:rev"
)
assert actual == 3
actual = graph_client.count_neighbors(
"swh:1:rev:0000000000000000000000000000000000000009", direction="backward"
)
assert actual == 3
@pytest.mark.parametrize("max_matching_nodes", [0, 1, 2, 3, 4, 5, 10, 1 << 31])
def test_count_with_limit(graph_client, max_matching_nodes):
actual = graph_client.count_leaves(
TEST_ORIGIN_ID, max_matching_nodes=max_matching_nodes
)
if max_matching_nodes == 0:
assert actual == 4
else:
assert actual == min(4, max_matching_nodes)
def test_param_validation(graph_client):
with raises(GraphArgumentException) as exc_info: # SWHID not found
list(graph_client.leaves("swh:1:rel:00ffffffff000000000000000000000000000010"))
if exc_info.value.response:
assert exc_info.value.response.status_code == 404
with raises(GraphArgumentException) as exc_info: # malformed SWHID
list(
graph_client.neighbors("swh:1:rel:00ffffffff00000000zzzzzzz000000000000010")
)
if exc_info.value.response:
assert exc_info.value.response.status_code == 400
with raises(GraphArgumentException) as exc_info: # malformed edge specificaiton
list(
graph_client.visit_nodes(
"swh:1:dir:0000000000000000000000000000000000000016",
edges="dir:notanodetype,dir:rev,rev:*",
direction="backward",
)
)
if exc_info.value.response:
assert exc_info.value.response.status_code == 400
with raises(GraphArgumentException) as exc_info: # malformed direction
list(
graph_client.visit_nodes(
"swh:1:dir:0000000000000000000000000000000000000016",
edges="dir:dir,dir:rev,rev:*",
direction="notadirection",
)
)
if exc_info.value.response:
assert exc_info.value.response.status_code == 400
@pytest.mark.skip(reason="currently disabled due to T1969")
def test_param_validation_walk(graph_client):
"""test validation of walk-specific parameters only"""
with raises(RemoteException) as exc_info: # malformed traversal order
list(
graph_client.walk(
"swh:1:dir:0000000000000000000000000000000000000016",
"rel",
edges="dir:dir,dir:rev,rev:*",
direction="backward",
traversal="notatraversalorder",
)
)
assert exc_info.value.response.status_code == 400