Page MenuHomeSoftware Heritage

Graph serviceFolder
ActivePublic

Details

Description

Service providing fast access to the graph representation of the Software Heritage archive.

The service is in-memory, based on a compressed representation of the Software Heritage Merkle DAG.

Recent Activity

Thu, Jul 29

vlorentz moved T2431: Document how to export the graph edge dataset from sys-admin (docs/sysadm) to developers (docs/devel/) on the Documentation board.
Thu, Jul 29, 3:54 PM · Documentation, Graph service, Datasets
vlorentz moved T2431: Document how to export the graph edge dataset from archive-users (docs/user-guides/) to sys-admin (docs/sysadm) on the Documentation board.
Thu, Jul 29, 3:54 PM · Documentation, Graph service, Datasets
vlorentz added a comment to T2431: Document how to export the graph edge dataset.

It is now somewhat documented here: https://forge.softwareheritage.org/source/swh-environment/browse/master/docker/services/swh-graph/entrypoint.sh

Thu, Jul 29, 3:54 PM · Documentation, Graph service, Datasets
vlorentz closed T1936: integrate swh-graph into the docker environment as Resolved.
Thu, Jul 29, 3:51 PM · Docker environment, Graph service

Wed, Jul 28

vlorentz added revisions to T1936: integrate swh-graph into the docker environment: D6042: graph: don't run export+compression unless needed, and add documentation, D6038: journalprocessor: Fix deserialize_message raising EOFError on the last message of each assignment.
Wed, Jul 28, 4:44 PM · Docker environment, Graph service
ardumont added a comment to T1936: integrate swh-graph into the docker environment.

Plus that documents it a bit in an automatic manner, so *thumbs up*.

Wed, Jul 28, 10:37 AM · Docker environment, Graph service

Tue, Jul 27

vlorentz added a comment to T1936: integrate swh-graph into the docker environment.

I decided to make the swh-graph container create the compressed graph itself before starting. That's the easiest way to use it AND to implement it IMO.

Tue, Jul 27, 5:39 PM · Docker environment, Graph service
vlorentz claimed T1936: integrate swh-graph into the docker environment.
Tue, Jul 27, 5:37 PM · Docker environment, Graph service
vlorentz added a revision to T1936: integrate swh-graph into the docker environment: D6028: journalprocessor: Fix freeze on empty offset ranges..
Tue, Jul 27, 5:37 PM · Docker environment, Graph service
vlorentz added revisions to T1936: integrate swh-graph into the docker environment: D6029: Add swh-graph to the docker environment, D6030: server: Define make_app_from_configfile so it can be ran by gunicorn.
Tue, Jul 27, 5:37 PM · Docker environment, Graph service

Jul 2 2021

ardumont added a subtask for T3168: Proper deployment of swh-graph with debian package: T2100: Bootstrap Debian packaging for swh.graph.
Jul 2 2021, 3:42 PM · Graph service, Puppet recipes
ardumont added a parent task for T2100: Bootstrap Debian packaging for swh.graph: T3168: Proper deployment of swh-graph with debian package.
Jul 2 2021, 3:42 PM · Graph service

May 4 2021

vlorentz triaged T3305: Add tests for limit/max_edges in swh/graph/tests/test_api_client.py as Normal priority.
May 4 2021, 12:43 PM · Easy hack, Graph service

May 3 2021

zack renamed T3301: graph: add test for the "algo" parameter of walk() from swh-graph: No tests of the "algo" parameter of walk() to graph: add test for the "algo" parameter of walk().
May 3 2021, 6:55 PM · Easy hack, Graph service
vlorentz triaged T3302: Write docstrings for each method in swh/graph/backend.py as Normal priority.
May 3 2021, 5:34 PM · Graph service
vlorentz triaged T3301: graph: add test for the "algo" parameter of walk() as Normal priority.
May 3 2021, 5:16 PM · Easy hack, Graph service

Apr 23 2021

vlorentz assigned T2220: swh-graph in production to zack.
Apr 23 2021, 4:50 PM · meta-task, Roadmap 2021, Graph service

Apr 16 2021

olasd triaged T3259: Gracefully handle a client closing the connection in the middle of a response being streamed as Normal priority.
Apr 16 2021, 6:34 PM · Graph service

Apr 14 2021

zack closed T1968: existing graph endpoints should not return 404 upon missing arguments as Invalid.

Sure! My apologies @Hakimb, but it's thank to your work that we have realized what was the right fate for this task.

Apr 14 2021, 5:10 PM · Easy hack, Graph service
seirl added a comment to T2981: Graph API: add a (node type) result filters.

I just want to write something here that maybe isn't clear from the initial task description. This filtering must happen *after* the visit, not during. We can already change *how* the graph is visited using the edges parameter, the goal of this task is to filter the result post-visit.

Apr 14 2021, 4:28 PM · Graph service
seirl added a comment to T1968: existing graph endpoints should not return 404 upon missing arguments.

Right, I suppose we can close the task then?

Apr 14 2021, 4:25 PM · Easy hack, Graph service
zack updated subscribers of T1968: existing graph endpoints should not return 404 upon missing arguments.

@seirl, @vlorentz: I see your point, and I agree. We should never have used /nested/paths for this API.
Maybe we should just reconsider this and, one @Hakimb is ready with a new traversal language proposal, we can map it to a better REST API that uses query parameters, and deal properly with 4xx return codes.

Apr 14 2021, 4:15 PM · Easy hack, Graph service
zack added a comment to T2981: Graph API: add a (node type) result filters.
In T2981#63164, @Hakimb wrote:

questions:

1/ So for the "filter that applies to visits that return nodes one by one" part, we are talking about: neighbors, walk, visit/nodes only?

Apr 14 2021, 4:13 PM · Graph service
Hakimb added a comment to T2981: Graph API: add a (node type) result filters.

1/ So for the "filter that applies to visits that return nodes one by one" part, we are talking about: neighbors, walk, visit/nodes only?
2/ the filter is a query parameter I guess?

Apr 14 2021, 12:27 PM · Graph service

Apr 13 2021

seirl updated subscribers of T1968: existing graph endpoints should not return 404 upon missing arguments.

@zack We talked about this on IRC with @vlorentz, I think this issue is invalid. We chose to have the source and destination nodes as part of the URI in the API. Semantically, it makes sense that accessing the path without these path fragments would return a 404: it's not a missing argument but an invalid path. If we had a ?src= and a &dst= arguments instead, then having a 400 error would make sense, but in our case the semantics are really weird.

Apr 13 2021, 7:05 PM · Easy hack, Graph service
Hakimb added a revision to T3161: graph service: add anti-DoS limit on the number of edges traversed: D5501: add an anti-Dos limit for edges traversed as a query parameter.
Apr 13 2021, 5:36 PM · Graph service

Apr 8 2021

zack added a comment to T3161: graph service: add anti-DoS limit on the number of edges traversed.

ok, so @Hakimb: go for no default value. If the query param is not passed, the visit will not stop before the end. If it's given, it will stop once the limit is reached. Call the query param ?max_edges. You will find that the java code already keeps track of the number of edges traversed, so you should just need to compare with that.

Apr 8 2021, 2:44 PM · Graph service
vlorentz added a comment to T3161: graph service: add anti-DoS limit on the number of edges traversed.

should there be a default value for it or not? (We want this to be consistent with swh-storage

Apr 8 2021, 2:41 PM · Graph service
zack added a comment to T3161: graph service: add anti-DoS limit on the number of edges traversed.

To complement what @vlorentz mentioned, we should actually stop the visit after the maximum number of edges has been reached, because it is keep doing the visit (no matter how many results are returned after it) that can DoS the swh-graph backend.

Apr 8 2021, 2:24 PM · Graph service
vlorentz added a comment to T3161: graph service: add anti-DoS limit on the number of edges traversed.
  1. I don't think you need to reproduce a DoS, just make sure we don't return more results than we should
  2. yes, a query parameter would be good
Apr 8 2021, 2:09 PM · Graph service
Hakimb added a comment to T3161: graph service: add anti-DoS limit on the number of edges traversed.

I have two questions to make sure I understand well :

Apr 8 2021, 1:17 PM · Graph service

Apr 7 2021

seirl closed T3178: document how to export the graph dataset automatically, a subtask of T1847: fully automate export of the graph dataset, as Invalid.
Apr 7 2021, 3:03 PM · Graph service, Datasets
seirl added a subtask for T1847: fully automate export of the graph dataset: T2431: Document how to export the graph edge dataset.
Apr 7 2021, 3:03 PM · Graph service, Datasets
seirl added a parent task for T2431: Document how to export the graph edge dataset: T1847: fully automate export of the graph dataset.
Apr 7 2021, 3:03 PM · Documentation, Graph service, Datasets

Apr 6 2021

Hakimb added a revision to T1968: existing graph endpoints should not return 404 upon missing arguments: D5411: return a 400 error when accessing endpoints without the arguments.
Apr 6 2021, 11:31 AM · Easy hack, Graph service

Mar 26 2021

zack reopened T1847: fully automate export of the graph dataset as "Open".

reopening, as ideally we'd like to have run the entire ORC export once to completion before closing

Mar 26 2021, 12:25 PM · Graph service, Datasets
seirl closed T1847: fully automate export of the graph dataset as Resolved.

The ORC exporter is done, and it's likely that we won't provide CSV exports in the future, or we'll generate them from the ORC format.

Mar 26 2021, 12:04 PM · Graph service, Datasets

Mar 23 2021

seirl assigned T3168: Proper deployment of swh-graph with debian package to olasd.
Mar 23 2021, 12:24 PM · Graph service, Puppet recipes
zack updated the task description for T3168: Proper deployment of swh-graph with debian package.
Mar 23 2021, 12:24 PM · Graph service, Puppet recipes
zack added a project to T3168: Proper deployment of swh-graph with debian package: Graph service.
Mar 23 2021, 12:23 PM · Graph service, Puppet recipes

Mar 22 2021

zack renamed T3161: graph service: add anti-DoS limit on the number of edges traversed from graph service: add limit on the number of edges traversed to graph service: add anti-DoS limit on the number of edges traversed.
Mar 22 2021, 9:43 AM · Graph service
zack added a subtask for T2220: swh-graph in production: T3161: graph service: add anti-DoS limit on the number of edges traversed.
Mar 22 2021, 9:43 AM · meta-task, Roadmap 2021, Graph service
zack added a parent task for T3161: graph service: add anti-DoS limit on the number of edges traversed: T2220: swh-graph in production.
Mar 22 2021, 9:43 AM · Graph service
vlorentz added a comment to T3161: graph service: add anti-DoS limit on the number of edges traversed.

To be consistent with swh-storage, swh-graph should take a limit as query parameter, but not have a hardcoded upper bound for that limit. Instead, swh-web provides that upper bound.

Mar 22 2021, 9:14 AM · Graph service
zack triaged T3161: graph service: add anti-DoS limit on the number of edges traversed as Normal priority.
Mar 22 2021, 9:12 AM · Graph service
zack closed T2113: swh-graph: add support to optionally resolve ori PIDs to origin URLs as Wontfix.

Now that this is (optionally) done by swh-web, I don't think we want to implement it in swh-graph too.

Mar 22 2021, 8:56 AM · Graph service

Mar 19 2021

rdicosmo moved T2220: swh-graph in production from Backlog to Work in progress on the Roadmap 2021 board.
Mar 19 2021, 12:45 PM · meta-task, Roadmap 2021, Graph service

Mar 16 2021

Kaustuv942 added a comment to T1968: existing graph endpoints should not return 404 upon missing arguments.

ohh allright.

Mar 16 2021, 8:44 PM · Easy hack, Graph service
zack placed T1968: existing graph endpoints should not return 404 upon missing arguments up for grabs.

Dear @Kaustuv942, sure, patches welcome. We do not use task claiming for non regular contributors though, just submit a patch when you have one.

Mar 16 2021, 6:59 PM · Easy hack, Graph service
Kaustuv942 added a comment to T1968: existing graph endpoints should not return 404 upon missing arguments.

Hello @zack I want to complete this task.

Mar 16 2021, 5:59 PM · Easy hack, Graph service