Page MenuHomeSoftware Heritage
Feed Advanced Search

Jun 23 2022

seirl added a revision to T3259: Gracefully handle a client closing the connection in the middle of a response being streamed: D7890: Migrate low-level RPC API from Py4J to GRPC.
Jun 23 2022, 7:00 PM · Compressed graph service
seirl added a revision to T2103: (Debian) package py4j: D7890: Migrate low-level RPC API from Py4J to GRPC.
Jun 23 2022, 7:00 PM · Compressed graph service
seirl added a revision to T4340: swh-graph timeouts: D7890: Migrate low-level RPC API from Py4J to GRPC.
Jun 23 2022, 7:00 PM · Compressed graph service
seirl added a revision to T4115: Some unknown SWHID errors crash the graph server: D7890: Migrate low-level RPC API from Py4J to GRPC.
Jun 23 2022, 6:54 PM · Compressed graph service

Jun 22 2022

vsellier added a comment to T4340: swh-graph timeouts.

I reversed engineered the py4j communication protocol, so next time it will hang, we should be able to tell if the issue is on the gateway server side or on the python side:

  • Create a name pipe
mkfifo /tmp/test
chmod a+w /tmp/test
tail -F /tmp/test
  • query the graph
ss -ltp | grep java
<get the port number>
telnet localhost <port number>
c
o0
get_handler
s/tmp/test
e
Jun 22 2022, 2:07 PM · Compressed graph service

Jun 20 2022

vlorentz added a comment to T4340: swh-graph timeouts.

Seems to be this issue: https://sentry.softwareheritage.org/share/issue/b0b9753142404227a6d7421eb014fdb3/ (if it helps...)

Jun 20 2022, 2:57 PM · Compressed graph service
vsellier updated the task description for T4340: swh-graph timeouts.
Jun 20 2022, 10:17 AM · Compressed graph service
vsellier updated the task description for T4340: swh-graph timeouts.
Jun 20 2022, 10:16 AM · Compressed graph service
vsellier triaged T4340: swh-graph timeouts as High priority.
Jun 20 2022, 10:16 AM · Compressed graph service

Jun 16 2022

bchauvet added a comment to T2220: swh-graph in production.
  1. Graph status meeting
Jun 16 2022, 5:40 PM · Roadmap 2022, meta-task, Roadmap 2021, Compressed graph service
olasd added a comment to T4250: Native hadoop libraries during graph compression.

So, as was mentioned during the irc discussion, one of the possible ways forward is to:

Jun 16 2022, 2:32 PM · Compressed graph service

Jun 9 2022

seirl added a comment to T4250: Native hadoop libraries during graph compression.

(4. doesn't work because libhadoop.so isn't packaged in maven)

Jun 9 2022, 4:49 PM · Compressed graph service

Jun 8 2022

zack added a comment to T4316: Push of swh-graph to pypi is broken.

For future reference, it looks like we are still "small" players as "big" packages go on PyPI: https://pypi.org/stats/ (e.g., tf-nightly is currently the largest package on PyPI and it weights 427 GiB).
While it is still not nice to ship a big fat JAR in a PyPI package, our extension requests will likely be granted.

Jun 8 2022, 4:03 PM · System administration, Compressed graph service
olasd added a comment to T4316: Push of swh-graph to pypi is broken.

We've asked for another bump at https://github.com/pypa/pypi-support/issues/1998.

Jun 8 2022, 3:33 PM · System administration, Compressed graph service
vlorentz updated subscribers of T4316: Push of swh-graph to pypi is broken.

IIRC, we cannot reduce the size; and I think it is unreasonable to ask PyPI for a higher limit.

Jun 8 2022, 2:45 PM · System administration, Compressed graph service
vsellier triaged T4316: Push of swh-graph to pypi is broken as High priority.
Jun 8 2022, 2:28 PM · System administration, Compressed graph service

May 20 2022

seirl added a comment to T3855: Document the architecture of the Java code in swh-graph.

Fixed in D7839

May 20 2022, 7:41 PM · Compressed graph service
seirl closed T3855: Document the architecture of the Java code in swh-graph as Resolved.
May 20 2022, 7:41 PM · Compressed graph service
swh-public-ci added a comment to D7839: Documentation overhaul.

Build is green

May 20 2022, 7:31 PM · Compressed graph service
seirl closed D7839: Documentation overhaul.
May 20 2022, 7:25 PM · Compressed graph service
seirl updated the diff for D7839: Documentation overhaul.

Review fixes

May 20 2022, 7:24 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 20 2022, 7:23 PM · Compressed graph service

May 19 2022

JaredR26 added inline comments to D7839: Documentation overhaul.
May 19 2022, 8:49 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 19 2022, 4:10 PM · Compressed graph service
swh-public-ci added a comment to D7839: Documentation overhaul.

Build is green

May 19 2022, 3:56 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 19 2022, 3:54 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 19 2022, 3:53 PM · Compressed graph service
seirl updated the diff for D7839: Documentation overhaul.

Review fixes

May 19 2022, 3:50 PM · Compressed graph service
seirl added a comment to D7839: Documentation overhaul.

Thanks for all the reviews, I made a first pass with the easiest fixes.

May 19 2022, 3:48 PM · Compressed graph service

May 18 2022

zack resigned from D7839: Documentation overhaul.

Monumental documentation work, thanks!
I think this is generally great, and I've pointed out only some minor issues/suggestions here and there.

May 18 2022, 1:21 PM · Compressed graph service

May 17 2022

JaredR26 added inline comments to D7839: Documentation overhaul.
May 17 2022, 10:41 PM · Compressed graph service
JaredR26 added inline comments to D7839: Documentation overhaul.
May 17 2022, 10:20 PM · Compressed graph service
JaredR26 added inline comments to D7839: Documentation overhaul.
May 17 2022, 10:06 PM · Compressed graph service
JaredR26 added a comment to D7839: Documentation overhaul.

Still reviewing the rest but went through the quickstart. I wanted to submit these comments before it got too late over there.

May 17 2022, 8:11 PM · Compressed graph service
swh-public-ci added a comment to D7839: Documentation overhaul.

Build is green

May 17 2022, 5:36 PM · Compressed graph service
seirl updated the diff for D7839: Documentation overhaul.

fix

May 17 2022, 5:29 PM · Compressed graph service
swh-public-ci added a comment to D7839: Documentation overhaul.

Build is green

May 17 2022, 5:14 PM · Compressed graph service
swh-public-ci added a comment to D7839: Documentation overhaul.

Build is green

May 17 2022, 5:09 PM · Compressed graph service
seirl updated the diff for D7839: Documentation overhaul.

typo

May 17 2022, 5:05 PM · Compressed graph service
seirl updated the diff for D7839: Documentation overhaul.

Review fixes

May 17 2022, 5:01 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 17 2022, 4:52 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 17 2022, 4:51 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 17 2022, 10:19 AM · Compressed graph service
vlorentz added a comment to D7839: Documentation overhaul.

very nice!

May 17 2022, 7:46 AM · Compressed graph service
seirl requested review of D7839: Documentation overhaul.
May 17 2022, 1:58 AM · Compressed graph service

May 16 2022

olasd added a comment to T4250: Native hadoop libraries during graph compression.

4.a. properly declare this in the maven dependencies of swh.graph
4.b. ensure the container image generation pipeline and container entrypoint script properly handle this extra dependency and argument

May 16 2022, 6:32 PM · Compressed graph service
seirl triaged T4250: Native hadoop libraries during graph compression as Normal priority.
May 16 2022, 6:29 PM · Compressed graph service

May 6 2022

seirl moved T3302: Write docstrings for each method in swh/graph/backend.py from Backlog to Deployed on the Compressed graph service board.
May 6 2022, 8:12 PM · Compressed graph service
seirl moved T4114: Add logging by default to swh-graph compression from Backlog to Deployed on the Compressed graph service board.
May 6 2022, 8:12 PM · Compressed graph service
seirl closed T4114: Add logging by default to swh-graph compression as Resolved by committing rDGRPH464fded2cd8b: Add logging to compression pipeline.
May 6 2022, 3:51 PM · Compressed graph service
seirl added a revision to T4114: Add logging by default to swh-graph compression: D7758: Add logging to compression pipeline.
May 6 2022, 3:18 PM · Compressed graph service

Apr 29 2022

seirl moved T1847: fully automate export of the graph dataset from Backlog to Deployed on the Compressed graph service board.
Apr 29 2022, 6:22 PM · Compressed graph service, Datasets
seirl moved T2431: Document how to export the graph edge dataset from Backlog to Deployed on the Compressed graph service board.
Apr 29 2022, 6:22 PM · Documentation, Compressed graph service, Datasets
seirl moved T3125: add revision timestamp to the compression timeline from Backlog to Deployed on the Compressed graph service board.
Apr 29 2022, 6:22 PM · Compressed graph service
seirl closed T3125: add revision timestamp to the compression timeline as Resolved.

We now store all the node properties in separate files.

Apr 29 2022, 6:22 PM · Compressed graph service
seirl closed T3125: add revision timestamp to the compression timeline, a subtask of T3126: API: add endpoint to find the earliest revision referencing a dir/cnt node, as Resolved.
Apr 29 2022, 6:22 PM · Compressed graph service
seirl moved T3768: Read compression input from ORC instead of the edges file from In progress to Deployed on the Compressed graph service board.
Apr 29 2022, 6:21 PM · Compressed graph service
seirl moved T2983: graph service: allow loading in memory only one direction of the graph from Implemented to Deployed on the Compressed graph service board.
Apr 29 2022, 6:21 PM · Compressed graph service
seirl closed T2431: Document how to export the graph edge dataset, a subtask of T1847: fully automate export of the graph dataset, as Resolved.
Apr 29 2022, 6:15 PM · Compressed graph service, Datasets
seirl closed T2431: Document how to export the graph edge dataset as Resolved.

Done here: D7693 and here: D7711

Apr 29 2022, 6:15 PM · Documentation, Compressed graph service, Datasets
seirl closed T1847: fully automate export of the graph dataset as Resolved.

Done!

Apr 29 2022, 5:57 PM · Compressed graph service, Datasets

Apr 9 2022

dhwajgupta updated subscribers of T3301: graph: add test for the "algo" parameter of walk().

Hi, @vlorentz @zack Is this task still valid?

Apr 9 2022, 7:59 AM · Easy hack, Compressed graph service

Mar 31 2022

aeviso closed T4118: Method `visit_edges` from `NaiveClient` seems to be missing some results as Invalid.

I've realized the problem is not in graph.visit_edges but on the way I'm using it. I forgot the border case where the revision has no parents

Mar 31 2022, 11:25 AM · Compressed graph service

Mar 30 2022

seirl closed T3768: Read compression input from ORC instead of the edges file as Resolved.
Mar 30 2022, 4:49 PM · Compressed graph service
vlorentz added a comment to T4118: Method `visit_edges` from `NaiveClient` seems to be missing some results.

Could you provide a full example so I can reproduce the issue?

Mar 30 2022, 3:34 PM · Compressed graph service
vlorentz triaged T4118: Method `visit_edges` from `NaiveClient` seems to be missing some results as High priority.
Mar 30 2022, 3:22 PM · Compressed graph service
zack added a watcher for Compressed graph service: zack.
Mar 30 2022, 1:41 PM
zack renamed Compressed graph service from Graph service to Compressed graph service.
Mar 30 2022, 1:40 PM

Mar 29 2022

seirl triaged T4115: Some unknown SWHID errors crash the graph server as Normal priority.
Mar 29 2022, 4:05 PM · Compressed graph service
vsellier added a watcher for Compressed graph service: vsellier.
Mar 29 2022, 3:17 PM
seirl triaged T4114: Add logging by default to swh-graph compression as Normal priority.
Mar 29 2022, 3:16 PM · Compressed graph service
seirl triaged T4113: Review border case of empty response for `visit_nodes` as Normal priority.
Mar 29 2022, 3:16 PM · Compressed graph service

Mar 25 2022

bchauvet raised the priority of T2220: swh-graph in production from Normal to High.
Mar 25 2022, 5:29 PM · Roadmap 2022, meta-task, Roadmap 2021, Compressed graph service

Mar 23 2022

bchauvet added a project to T2220: swh-graph in production: Roadmap 2022.
Mar 23 2022, 5:07 PM · Roadmap 2022, meta-task, Roadmap 2021, Compressed graph service

Feb 21 2022

seirl moved T3831: Flaky test in swh-graph from In progress to Deployed on the Compressed graph service board.
Feb 21 2022, 12:56 PM · Compressed graph service
seirl moved T2981: Graph API: add a (node type) result filters from In progress to Deployed on the Compressed graph service board.
Feb 21 2022, 12:56 PM · Compressed graph service
seirl moved T2647: add LLP support to graph compression pipeline from In progress to Deployed on the Compressed graph service board.
Feb 21 2022, 12:56 PM · Compressed graph service
seirl moved T3739: swh-graph: Remove SWHID -> Node ID mapping, use MPH instead from Implemented to Deployed on the Compressed graph service board.
Feb 21 2022, 12:56 PM · Compressed graph service
seirl moved T3740: swh-graph: Translate node IDs on the Java side, not Python side from Implemented to Deployed on the Compressed graph service board.
Feb 21 2022, 12:56 PM · Compressed graph service
seirl moved T3161: graph service: add anti-DoS limit on the number of edges traversed from Implemented to Deployed on the Compressed graph service board.
Feb 21 2022, 12:55 PM · Compressed graph service

Feb 8 2022

vlorentz added a parent task for T2220: swh-graph in production: T887: Vault: "snapshot" cooker.
Feb 8 2022, 2:05 PM · Roadmap 2022, meta-task, Roadmap 2021, Compressed graph service

Jan 18 2022

seirl closed T2981: Graph API: add a (node type) result filters as Resolved by committing rDGRPH294128e0f96e: Use AllowedNodesTest to implement return type filtering.
Jan 18 2022, 1:26 PM · Compressed graph service

Jan 17 2022

seirl closed T2983: graph service: allow loading in memory only one direction of the graph as Resolved.
Jan 17 2022, 4:33 PM · Compressed graph service
seirl closed T3302: Write docstrings for each method in swh/graph/backend.py as Resolved.

Since D6676 the specialized methods of swh/graph/backend.py have all been removed and replaced by a generic proxy layer that calls all the methods in a completely transparent fashion, so this specific issue appears to be obsolete now.

Jan 17 2022, 4:09 PM · Compressed graph service
seirl triaged T3855: Document the architecture of the Java code in swh-graph as Normal priority.
Jan 17 2022, 3:40 PM · Compressed graph service
vlorentz added a comment to T1971: Integrate swh-graph javadoc in swh-docs.

A simple cp command in swh-docs Makefile can do the trick, but we need a generic way of adding files/folder in the swh-docs build directory.

Jan 17 2022, 2:59 PM · Documentation, Compressed graph service

Jan 15 2022

seirl changed the status of T2981: Graph API: add a (node type) result filters from Open to Work in Progress.
Jan 15 2022, 12:04 AM · Compressed graph service
seirl added a revision to T2981: Graph API: add a (node type) result filters: D6954: Use AllowedNodesTest to implement return type filtering.
Jan 15 2022, 12:03 AM · Compressed graph service

Jan 14 2022

seirl renamed T3832: Investigate Luigi as an ETL framework for the compression pipeline from Investigate Luigi as an ETR framework for the compression pipeline to Investigate Luigi as an ETL framework for the compression pipeline.
Jan 14 2022, 11:33 PM · Compressed graph service
seirl added a revision to T2983: graph service: allow loading in memory only one direction of the graph: D6953: Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph.
Jan 14 2022, 11:29 PM · Compressed graph service
seirl changed the status of T2983: graph service: allow loading in memory only one direction of the graph from Open to Work in Progress.
Jan 14 2022, 11:29 PM · Compressed graph service

Jan 13 2022

seirl closed T3831: Flaky test in swh-graph as Resolved by committing rDGRPHc2074adc3d2c: Increase retries for random walks from 5 to 10.
Jan 13 2022, 4:08 PM · Compressed graph service

Jan 12 2022

seirl closed T3161: graph service: add anti-DoS limit on the number of edges traversed, a subtask of T2220: swh-graph in production, as Resolved.
Jan 12 2022, 5:01 PM · Roadmap 2022, meta-task, Roadmap 2021, Compressed graph service
seirl closed T3161: graph service: add anti-DoS limit on the number of edges traversed as Resolved.
Jan 12 2022, 5:01 PM · Compressed graph service

Jan 10 2022

seirl added a comment to T3831: Flaky test in swh-graph.

No, we want to check that random_walk can reach its actual destination.

Jan 10 2022, 2:54 PM · Compressed graph service
anlambert added a comment to T3831: Flaky test in swh-graph.
In T3831#76627, @seirl wrote:

I made a temporary fix in D6893, it doesn't solve the underlying issue but greatly decreases the probability of it happening. I'm not quite sure what would be a proper test for this endpoint, but this is at least enough to fix this issue in particular.

Jan 10 2022, 2:47 PM · Compressed graph service

Jan 7 2022

seirl changed the status of T3831: Flaky test in swh-graph from Open to Work in Progress.
Jan 7 2022, 4:37 PM · Compressed graph service
seirl added a comment to T3831: Flaky test in swh-graph.

I made a temporary fix in D6893, it doesn't solve the underlying issue but greatly decreases the probability of it happening. I'm not quite sure what would be a proper test for this endpoint, but this is at least enough to fix this issue in particular.

Jan 7 2022, 4:36 PM · Compressed graph service
seirl added a revision to T3831: Flaky test in swh-graph: D6893: Increase retries for random walks from 5 to 10.
Jan 7 2022, 4:26 PM · Compressed graph service

Jan 4 2022

seirl triaged T3832: Investigate Luigi as an ETL framework for the compression pipeline as Normal priority.
Jan 4 2022, 2:17 PM · Compressed graph service