Page MenuHomeSoftware Heritage

Compressed graph serviceFolder
ActivePublic

Details

Description

Service providing fast access to an in-memory compressed graph representation of the Software Heritage archive.

The service is based on a representation of the Software Heritage Merkle DAG, compressed using the WebGraph framework.

Recent Activity

Yesterday

seirl added a comment to T3259: Gracefully handle a client closing the connection in the middle of a response being streamed.

GRPC update: cancelling a GRPC stream works fine, but it doesn't seem like aiohttp is doing that when the HTTP stream is closed.

Thu, Jun 30, 6:41 PM · Compressed graph service
seirl closed T4316: Push of swh-graph to pypi is broken as Resolved.

We requested a larger quota here: https://github.com/pypa/pypi-support/issues/1998

Thu, Jun 30, 5:37 PM · System administration, Compressed graph service
seirl closed T2100: Bootstrap Debian packaging for swh.graph, a subtask of T3168: Proper deployment of swh-graph with debian package, as Wontfix.
Thu, Jun 30, 4:21 PM · Compressed graph service, Puppet recipes
seirl closed T2100: Bootstrap Debian packaging for swh.graph as Wontfix.

We are migrating away from Debian packages as a deployment tool, closing this as WONTFIX.

Thu, Jun 30, 4:21 PM · Compressed graph service
seirl closed T3168: Proper deployment of swh-graph with debian package as Wontfix.

We are migrating away from Debian packages as a deployment tool, closing this as WONTFIX.

Thu, Jun 30, 4:20 PM · Compressed graph service, Puppet recipes
seirl closed T2081: swh-graph: "Cannot open client FIFO" when answering HEAD requests as Resolved.

No longer happens after the GRPC migration:

Thu, Jun 30, 4:19 PM · Compressed graph service
seirl closed T2103: (Debian) package py4j, a subtask of T2100: Bootstrap Debian packaging for swh.graph, as Wontfix.
Thu, Jun 30, 4:16 PM · Compressed graph service
seirl closed T2103: (Debian) package py4j as Wontfix.

We removed the Py4J dependency by migrating to GRPC.

Thu, Jun 30, 4:16 PM · Compressed graph service
seirl closed T3301: graph: add test for the "algo" parameter of walk() as Wontfix.

We no longer support multiple algorithms for shortest path requests.

Thu, Jun 30, 4:16 PM · Easy hack, Compressed graph service
seirl closed T3623: Run swh-graph with gunicorn to support multiple/parallel requests as Resolved.

Obsoleted by the migration to GRPC. Now we use GRPC's threading model, with a threadpool configurable by passing --threads to the Java service. By default, nproc is used.

Thu, Jun 30, 4:12 PM · Compressed graph service, System administration
seirl closed T4113: Review border case of empty response for `visit_nodes` as Resolved.

Obsoleted by the migration to GRPC. We no longer create iterators on the decoded stream of a UNIX pipeline, we directly use GRPC stream iterators.

Thu, Jun 30, 4:11 PM · Compressed graph service
seirl closed T4115: Some unknown SWHID errors crash the graph server as Resolved.

Fixed by the migration to GRPC.

Thu, Jun 30, 4:08 PM · Compressed graph service
seirl closed T3793: Add copyright notices to all swh-graph Java files as Resolved.

Fixed in D8050

Thu, Jun 30, 2:28 PM · Compressed graph service

Wed, Jun 29

seirl added a revision to T3793: Add copyright notices to all swh-graph Java files: D8050: Add missing copyright notices to the entire Java codebase.
Wed, Jun 29, 3:07 PM · Compressed graph service

Tue, Jun 28

vsellier closed T4340: swh-graph timeouts as Wontfix.

I will be solved by D7890

Tue, Jun 28, 6:49 PM · Compressed graph service

Fri, Jun 24

vsellier added a comment to T4340: swh-graph timeouts.

It's confirmed that the issue seems to be on the python part of the current implementation so I'm eager to see D7890 landed ;)

Fri, Jun 24, 10:13 AM · Compressed graph service

Thu, Jun 23

seirl added a revision to T3259: Gracefully handle a client closing the connection in the middle of a response being streamed: D7890: Migrate low-level RPC API from Py4J to GRPC.
Thu, Jun 23, 7:00 PM · Compressed graph service
seirl added a revision to T2103: (Debian) package py4j: D7890: Migrate low-level RPC API from Py4J to GRPC.
Thu, Jun 23, 7:00 PM · Compressed graph service
seirl added a revision to T4340: swh-graph timeouts: D7890: Migrate low-level RPC API from Py4J to GRPC.
Thu, Jun 23, 7:00 PM · Compressed graph service
seirl added a revision to T4115: Some unknown SWHID errors crash the graph server: D7890: Migrate low-level RPC API from Py4J to GRPC.
Thu, Jun 23, 6:54 PM · Compressed graph service

Wed, Jun 22

vsellier added a comment to T4340: swh-graph timeouts.

I reversed engineered the py4j communication protocol, so next time it will hang, we should be able to tell if the issue is on the gateway server side or on the python side:

  • Create a name pipe
mkfifo /tmp/test
chmod a+w /tmp/test
tail -F /tmp/test
  • query the graph
ss -ltp | grep java
<get the port number>
telnet localhost <port number>
c
o0
get_handler
s/tmp/test
e
Wed, Jun 22, 2:07 PM · Compressed graph service

Mon, Jun 20

vlorentz added a comment to T4340: swh-graph timeouts.

Seems to be this issue: https://sentry.softwareheritage.org/share/issue/b0b9753142404227a6d7421eb014fdb3/ (if it helps...)

Mon, Jun 20, 2:57 PM · Compressed graph service
vsellier updated the task description for T4340: swh-graph timeouts.
Mon, Jun 20, 10:17 AM · Compressed graph service
vsellier updated the task description for T4340: swh-graph timeouts.
Mon, Jun 20, 10:16 AM · Compressed graph service
vsellier triaged T4340: swh-graph timeouts as High priority.
Mon, Jun 20, 10:16 AM · Compressed graph service

Thu, Jun 16

bchauvet added a comment to T2220: swh-graph in production.
  1. Graph status meeting
Thu, Jun 16, 5:40 PM · Roadmap 2022, meta-task, Roadmap 2021, Compressed graph service
olasd added a comment to T4250: Native hadoop libraries during graph compression.

So, as was mentioned during the irc discussion, one of the possible ways forward is to:

Thu, Jun 16, 2:32 PM · Compressed graph service

Thu, Jun 9

seirl added a comment to T4250: Native hadoop libraries during graph compression.

(4. doesn't work because libhadoop.so isn't packaged in maven)

Thu, Jun 9, 4:49 PM · Compressed graph service

Wed, Jun 8

zack added a comment to T4316: Push of swh-graph to pypi is broken.

For future reference, it looks like we are still "small" players as "big" packages go on PyPI: https://pypi.org/stats/ (e.g., tf-nightly is currently the largest package on PyPI and it weights 427 GiB).
While it is still not nice to ship a big fat JAR in a PyPI package, our extension requests will likely be granted.

Wed, Jun 8, 4:03 PM · System administration, Compressed graph service
olasd added a comment to T4316: Push of swh-graph to pypi is broken.

We've asked for another bump at https://github.com/pypa/pypi-support/issues/1998.

Wed, Jun 8, 3:33 PM · System administration, Compressed graph service
vlorentz updated subscribers of T4316: Push of swh-graph to pypi is broken.

IIRC, we cannot reduce the size; and I think it is unreasonable to ask PyPI for a higher limit.

Wed, Jun 8, 2:45 PM · System administration, Compressed graph service
vsellier triaged T4316: Push of swh-graph to pypi is broken as High priority.
Wed, Jun 8, 2:28 PM · System administration, Compressed graph service

May 20 2022

seirl added a comment to T3855: Document the architecture of the Java code in swh-graph.

Fixed in D7839

May 20 2022, 7:41 PM · Compressed graph service
seirl closed T3855: Document the architecture of the Java code in swh-graph as Resolved.
May 20 2022, 7:41 PM · Compressed graph service
swh-public-ci added a comment to D7839: Documentation overhaul.

Build is green

May 20 2022, 7:31 PM · Compressed graph service
seirl closed D7839: Documentation overhaul.
May 20 2022, 7:25 PM · Compressed graph service
seirl updated the diff for D7839: Documentation overhaul.

Review fixes

May 20 2022, 7:24 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 20 2022, 7:23 PM · Compressed graph service

May 19 2022

JaredR26 added inline comments to D7839: Documentation overhaul.
May 19 2022, 8:49 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 19 2022, 4:10 PM · Compressed graph service
swh-public-ci added a comment to D7839: Documentation overhaul.

Build is green

May 19 2022, 3:56 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 19 2022, 3:54 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 19 2022, 3:53 PM · Compressed graph service
seirl updated the diff for D7839: Documentation overhaul.

Review fixes

May 19 2022, 3:50 PM · Compressed graph service
seirl added a comment to D7839: Documentation overhaul.

Thanks for all the reviews, I made a first pass with the easiest fixes.

May 19 2022, 3:48 PM · Compressed graph service

May 18 2022

zack resigned from D7839: Documentation overhaul.

Monumental documentation work, thanks!
I think this is generally great, and I've pointed out only some minor issues/suggestions here and there.

May 18 2022, 1:21 PM · Compressed graph service

May 17 2022

JaredR26 added inline comments to D7839: Documentation overhaul.
May 17 2022, 10:41 PM · Compressed graph service
JaredR26 added inline comments to D7839: Documentation overhaul.
May 17 2022, 10:20 PM · Compressed graph service
JaredR26 added inline comments to D7839: Documentation overhaul.
May 17 2022, 10:06 PM · Compressed graph service
JaredR26 added a comment to D7839: Documentation overhaul.

Still reviewing the rest but went through the quickstart. I wanted to submit these comments before it got too late over there.

May 17 2022, 8:11 PM · Compressed graph service