Page MenuHomeSoftware Heritage
Feed Advanced Search

Jun 29 2022

seirl committed rDGRPH43a3e3302abe: Update webgraph-big to 3.7.0 (authored by seirl).
Update webgraph-big to 3.7.0
Jun 29 2022, 12:07 PM
seirl committed rDGRPH53b7d553ba06: GraphServer: Use nproc threads in threadpool by default (authored by seirl).
GraphServer: Use nproc threads in threadpool by default
Jun 29 2022, 12:07 PM
seirl updated the diff for D7890: Migrate low-level RPC API from Py4J to GRPC.
  • SwhBidirectionalGraph: fix copy() not actually copying subgraphs
  • GetNode: make endpoint thread-safe with a lightweight copy of the graph
  • GraphServer: Use nproc threads in threadpool by default
  • Update webgraph-big to 3.7.0
Jun 29 2022, 12:07 PM

Jun 28 2022

seirl closed D8037: swhgraphshm: do not hardcode the graph path, use /latest symlink.
Jun 28 2022, 6:27 PM
seirl committed rSPSITEf64bcc100756: swhgraphshm: do not hardcode the graph path, use /latest symlink (authored by seirl).
swhgraphshm: do not hardcode the graph path, use /latest symlink
Jun 28 2022, 6:26 PM
seirl updated the diff for D8037: swhgraphshm: do not hardcode the graph path, use /latest symlink.

Rebase

Jun 28 2022, 6:26 PM

Jun 24 2022

seirl requested review of D8037: swhgraphshm: do not hardcode the graph path, use /latest symlink.
Jun 24 2022, 8:30 PM
seirl updated the diff for D7890: Migrate low-level RPC API from Py4J to GRPC.
  • Remove old useless classes, including Traversal
  • Regenerate python protobuf documentation
  • style: typos and indent fixes
Jun 24 2022, 12:28 PM
seirl added inline comments to D7890: Migrate low-level RPC API from Py4J to GRPC.
Jun 24 2022, 12:03 PM
seirl added inline comments to D7890: Migrate low-level RPC API from Py4J to GRPC.
Jun 24 2022, 12:02 PM
seirl added inline comments to D7890: Migrate low-level RPC API from Py4J to GRPC.
Jun 24 2022, 11:38 AM

Jun 23 2022

seirl added a revision to T3259: Gracefully handle a client closing the connection in the middle of a response being streamed: D7890: Migrate low-level RPC API from Py4J to GRPC.
Jun 23 2022, 7:00 PM · Compressed graph service
seirl added a revision to T2103: (Debian) package py4j: D7890: Migrate low-level RPC API from Py4J to GRPC.
Jun 23 2022, 7:00 PM · Compressed graph service
seirl added a revision to T4340: swh-graph timeouts: D7890: Migrate low-level RPC API from Py4J to GRPC.
Jun 23 2022, 7:00 PM · Compressed graph service
seirl added tasks to D7890: Migrate low-level RPC API from Py4J to GRPC: Unknown Object (Maniphest Task), T4340: swh-graph timeouts, T2103: (Debian) package py4j, T3259: Gracefully handle a client closing the connection in the middle of a response being streamed.
Jun 23 2022, 7:00 PM
seirl added a revision to T4115: Some unknown SWHID errors crash the graph server: D7890: Migrate low-level RPC API from Py4J to GRPC.
Jun 23 2022, 6:54 PM · Compressed graph service
seirl added a task to D7890: Migrate low-level RPC API from Py4J to GRPC: T4115: Some unknown SWHID errors crash the graph server.
Jun 23 2022, 6:54 PM
seirl updated the diff for D7890: Migrate low-level RPC API from Py4J to GRPC.
  • Add inline docstrings to Java GRPC server
Jun 23 2022, 6:17 PM
seirl abandoned D2349: java/Traversal: add findCommonDescendant.

Will be replaced by the GRPC API

Jun 23 2022, 6:16 PM
seirl updated the diff for D7890: Migrate low-level RPC API from Py4J to GRPC.
  • doc: First full draft of GRPC API documentation
Jun 23 2022, 5:37 PM
seirl added inline comments to D7890: Migrate low-level RPC API from Py4J to GRPC.
Jun 23 2022, 12:41 PM
seirl updated the diff for D7890: Migrate low-level RPC API from Py4J to GRPC.
  • SwhGraphProperties: remove useless IOExceptions
  • SwhUnidirectionalGraph: make constructor with properties public
  • GRPC: initial commit with protobuf + java server
  • Add CheckSwhid() and Stats() RPC methods
  • RPC: temporarily use loadLabelled() because loadLabelledMapped() doesn't work yet
  • Java: add proto/ dir symlink
  • Python: migrate HTTP API to GRPC
  • Java tests: migrate to GRPC
  • Remove old Java HTTP server, replaced by GRPC
  • NodeIdMap: use new *MappedBigList classes, remove deprecated classes and MapFile
  • Remove now useless backend.py
  • Remove now useless dot.py
  • Reorganize Python files (RPC/HTTP server distinction)
  • flake: exclude swh/graph/rpc dir
  • requirements-test: add grpc-stubs
  • mypy.ini: ignore grpc generated files
  • Add rpc.StatsTest
  • proto: migrate to FieldMask to filter out fields
  • Add FindPathTo and FindPathBetween endpoints
  • Remove GraphDirection.BOTH (labelled iteration not supported yet)
  • Traversal: test traversals from multiple sources/to multiple dests
  • Traversal: test impossible paths
  • Traversal: check for invalid arguments
  • Traversal: simplify the StopTraversal logic
  • Remove CheckSwhid, use GetNode instead
  • Add tests for CountNodes/CountEdges
  • Traversal: add max depth and common ancestors tests
  • Traversal: add max edge tests
  • Document protobuf/grpc services and fields
  • More protobuf/grpc documentation, better field names
  • swhgraph.proto: small documentation fixes
  • doc: add GRPC page skeleton
Jun 23 2022, 12:40 PM

Jun 14 2022

seirl planned changes to D7890: Migrate low-level RPC API from Py4J to GRPC.

Still needs some work/documentation

Jun 14 2022, 7:30 PM
seirl requested review of D7890: Migrate low-level RPC API from Py4J to GRPC.
Jun 14 2022, 7:30 PM
seirl closed D7935: ORC: handle nullable columns/empty tables properly.

Merged

Jun 14 2022, 5:38 PM
seirl updated the diff for D7935: ORC: handle nullable columns/empty tables properly.

rebase

Jun 14 2022, 5:37 PM
seirl committed rDGRPH89975d2372a7: ORC: handle nullable columns properly (authored by seirl).
ORC: handle nullable columns properly
Jun 14 2022, 5:37 PM

Jun 9 2022

seirl committed rDGRPH3dfb46542771: python cli: fix compress documentation (authored by seirl).
python cli: fix compress documentation
Jun 9 2022, 7:37 PM
seirl added a comment to T4250: Native hadoop libraries during graph compression.

(4. doesn't work because libhadoop.so isn't packaged in maven)

Jun 9 2022, 4:49 PM · Compressed graph service

Jun 1 2022

seirl committed rDGRPHf5e658596c8f: SwhUnidirectionalGraph: fix copy() for labelled graphs (authored by seirl).
SwhUnidirectionalGraph: fix copy() for labelled graphs
Jun 1 2022, 4:54 PM
seirl planned changes to D7936: [WIP] collabgraph: add tool to generate author collaboration graphs.
Jun 1 2022, 4:20 PM
seirl retitled D7936: [WIP] collabgraph: add tool to generate author collaboration graphs from collabgraph: add tool to generate author collaboration graphs to [WIP] collabgraph: add tool to generate author collaboration graphs.
Jun 1 2022, 3:59 PM
seirl added a comment to D7936: [WIP] collabgraph: add tool to generate author collaboration graphs.

This was meant to be a draft, but I couldn't find the button to make it so

Jun 1 2022, 3:59 PM
seirl requested review of D7936: [WIP] collabgraph: add tool to generate author collaboration graphs.
Jun 1 2022, 2:37 PM
seirl requested review of D7935: ORC: handle nullable columns/empty tables properly.
Jun 1 2022, 2:34 PM

May 23 2022

seirl committed rDDATASET5916d8b5fc97: docs: add missing refs, comment outdated schema (authored by seirl).
docs: add missing refs, comment outdated schema
May 23 2022, 2:15 PM

May 20 2022

seirl added a comment to T3855: Document the architecture of the Java code in swh-graph.

Fixed in D7839

May 20 2022, 7:41 PM · Compressed graph service
seirl closed T3855: Document the architecture of the Java code in swh-graph as Resolved.
May 20 2022, 7:41 PM · Compressed graph service
seirl closed D7839: Documentation overhaul.
May 20 2022, 7:25 PM · Compressed graph service
seirl committed rDGRPHaaed82fc23e9: Documentation overhaul (authored by seirl).
Documentation overhaul
May 20 2022, 7:25 PM
seirl updated the diff for D7839: Documentation overhaul.

Review fixes

May 20 2022, 7:24 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 20 2022, 7:23 PM · Compressed graph service

May 19 2022

seirl added inline comments to D7839: Documentation overhaul.
May 19 2022, 4:10 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 19 2022, 3:54 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 19 2022, 3:53 PM · Compressed graph service
seirl updated the diff for D7839: Documentation overhaul.

Review fixes

May 19 2022, 3:50 PM · Compressed graph service
seirl added a comment to D7839: Documentation overhaul.

Thanks for all the reviews, I made a first pass with the easiest fixes.

May 19 2022, 3:48 PM · Compressed graph service

May 17 2022

seirl updated the diff for D7839: Documentation overhaul.

fix

May 17 2022, 5:29 PM · Compressed graph service
seirl updated the diff for D7839: Documentation overhaul.

typo

May 17 2022, 5:05 PM · Compressed graph service
seirl updated the diff for D7839: Documentation overhaul.

Review fixes

May 17 2022, 5:01 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 17 2022, 4:52 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 17 2022, 4:51 PM · Compressed graph service
seirl added inline comments to D7839: Documentation overhaul.
May 17 2022, 10:19 AM · Compressed graph service
seirl requested review of D7839: Documentation overhaul.
May 17 2022, 1:58 AM · Compressed graph service

May 16 2022

seirl triaged T4250: Native hadoop libraries during graph compression as Normal priority.
May 16 2022, 6:29 PM · Compressed graph service

May 14 2022

seirl committed rDGRPH579f5a9e89be: requirements-test.txt: add missing pytest-asyncio (authored by seirl).
requirements-test.txt: add missing pytest-asyncio
May 14 2022, 12:47 AM
seirl committed rDGRPH768090281f75: config: fix logging warning (warn() -> warning()) (authored by seirl).
config: fix logging warning (warn() -> warning())
May 14 2022, 12:45 AM
seirl committed rDGRPHa30bcc4272f6: cli: fix compress docstring (authored by seirl).
cli: fix compress docstring
May 14 2022, 12:45 AM
seirl closed D7814: Remove dead code from the Python interface.
May 14 2022, 12:45 AM
seirl committed rDGRPHc5a5a76acb28: Remove dead code from the Python interface (authored by seirl).
Remove dead code from the Python interface
May 14 2022, 12:45 AM

May 12 2022

seirl added a comment to D7814: Remove dead code from the Python interface.
In D7814#203336, @zack wrote:

I'm fine with this code cleanup, with one caveat: that we document/ship the systemd startup service (and its meaning, including some intuitions about the trade-offs you mention) somewhere, in replacement of the cachemount command.

May 12 2022, 11:38 PM

May 11 2022

seirl added a comment to D7814: Remove dead code from the Python interface.

@seirl: So, for future version deployments, nothing changes, right?

May 11 2022, 5:52 PM
seirl requested review of D7814: Remove dead code from the Python interface.
May 11 2022, 4:18 PM

May 10 2022

seirl committed rDGRPH6a7d2ad35e2c: Test dataset: recompress with new pipeline (authored by seirl).
Test dataset: recompress with new pipeline
May 10 2022, 5:22 PM
seirl committed rDGRPH53f9fc5c413c: Test dataset: update with new ORC naming scheme (authored by seirl).
Test dataset: update with new ORC naming scheme
May 10 2022, 5:10 PM
seirl committed rDGRPH1692076e603f: Compression: clean up temporary files, minor fixes (authored by seirl).
Compression: clean up temporary files, minor fixes
May 10 2022, 5:00 PM

May 8 2022

seirl edited P1361 (An Untitled Masterwork).
May 8 2022, 12:33 PM
seirl created P1361 (An Untitled Masterwork).
May 8 2022, 12:32 PM
seirl created P1360 (An Untitled Masterwork).
May 8 2022, 11:15 AM

May 6 2022

seirl moved T3302: Write docstrings for each method in swh/graph/backend.py from Backlog to Deployed on the Compressed graph service board.
May 6 2022, 8:12 PM · Compressed graph service
seirl moved T4114: Add logging by default to swh-graph compression from Backlog to Deployed on the Compressed graph service board.
May 6 2022, 8:12 PM · Compressed graph service
seirl closed D7760: Compress: use parallel ScatteredArcsORCGraph instead of converting to ASCII.
May 6 2022, 4:18 PM
seirl committed rDGRPHbd67a8896bba: Compress: use parallel ScatteredArcsORCGraph instead of converting to ASCII (authored by seirl).
Compress: use parallel ScatteredArcsORCGraph instead of converting to ASCII
May 6 2022, 4:18 PM
seirl requested review of D7760: Compress: use parallel ScatteredArcsORCGraph instead of converting to ASCII.
May 6 2022, 4:09 PM
seirl committed rDGRPH2b3ed4afcec9: generate_dataset.py: only rmtree() output directories if they exist (authored by seirl).
generate_dataset.py: only rmtree() output directories if they exist
May 6 2022, 3:53 PM
seirl closed T4114: Add logging by default to swh-graph compression as Resolved by committing rDGRPH464fded2cd8b: Add logging to compression pipeline.
May 6 2022, 3:51 PM · Compressed graph service
seirl closed D7758: Add logging to compression pipeline.
May 6 2022, 3:51 PM
seirl committed rDGRPH464fded2cd8b: Add logging to compression pipeline (authored by seirl).
Add logging to compression pipeline
May 6 2022, 3:51 PM
seirl added a comment to D7758: Add logging to compression pipeline.

Thanks for the review!

May 6 2022, 3:50 PM
seirl updated the diff for D7758: Add logging to compression pipeline.

Address olasd's review

May 6 2022, 3:46 PM
seirl requested review of D7758: Add logging to compression pipeline.
May 6 2022, 3:22 PM
seirl added a revision to T4114: Add logging by default to swh-graph compression: D7758: Add logging to compression pipeline.
May 6 2022, 3:18 PM · Compressed graph service
seirl committed rDGRPH8b34fb1d4f65: Upgrade dsiutils/sux4j (authored by seirl).
Upgrade dsiutils/sux4j
May 6 2022, 12:30 PM
seirl committed rDGRPHb3128249f7a9: Add checks to prevent crashing on missing node properties (authored by seirl).
Add checks to prevent crashing on missing node properties
May 6 2022, 12:30 PM
seirl committed rDGRPH5ffb3f39f9fd: ExtractNodes: spawn many sort(1) in parallel to avoid locking, then a sort -m… (authored by seirl).
ExtractNodes: spawn many sort(1) in parallel to avoid locking, then a sort -m…
May 6 2022, 12:30 PM
seirl closed D7733: ExtractNodes: read ORC files in parallel.
May 6 2022, 12:30 PM
seirl committed rDGRPHfffc2cc6318b: ExtractNodes: read ORC files in parallel (authored by seirl).
ExtractNodes: read ORC files in parallel
May 6 2022, 12:30 PM

May 4 2022

seirl created P1358 stupid directories.
May 4 2022, 5:01 PM

May 3 2022

seirl updated the diff for D7733: ExtractNodes: read ORC files in parallel.

Fix sort buffer size argument (add b suffix)

May 3 2022, 10:08 PM
seirl updated the diff for D7733: ExtractNodes: read ORC files in parallel.

Compute sane default for RAM usage

May 3 2022, 9:38 PM
seirl requested review of D7733: ExtractNodes: read ORC files in parallel.
May 3 2022, 9:16 PM
seirl committed rDGRPH2998fc43f82c: Add ScatteredArcsORCGraph, which can read ORC edges in parallel (authored by seirl).
Add ScatteredArcsORCGraph, which can read ORC edges in parallel
May 3 2022, 3:50 PM
seirl committed rDGRPHd5b3b0768d63: Read ORCGraphDataset files in parallel when called from a ForkJoinPool (authored by seirl).
Read ORCGraphDataset files in parallel when called from a ForkJoinPool
May 3 2022, 3:50 PM
seirl committed rDGRPH6626f0a86a73: LabelMapBuilder: parallel read of ORC edges into batches (authored by seirl).
LabelMapBuilder: parallel read of ORC edges into batches
May 3 2022, 3:50 PM
seirl committed rDGRPH654ff3139662: LabelMapBuilder: use a parallelQuickSort to reestablish local order after heap… (authored by seirl).
LabelMapBuilder: use a parallelQuickSort to reestablish local order after heap…
May 3 2022, 3:49 PM
seirl committed rDGRPH597b13e66507: Add -m/--mapped option to graph reading tools (authored by seirl).
Add -m/--mapped option to graph reading tools
May 3 2022, 3:49 PM

May 1 2022

seirl closed T1848: refresh graph dataset export, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), as Resolved.
May 1 2022, 12:08 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage
seirl closed T1848: refresh graph dataset export as Resolved.

Now that there is both a columnar+compressed graph from 2021 and a columnar graph from 2022 that is pending compression, this task about "refreshing the export from January 2019" is resolved.

May 1 2022, 12:08 PM · Datasets

Apr 29 2022

seirl changed the status of T1848: refresh graph dataset export, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), from Open to Work in Progress.
Apr 29 2022, 6:23 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage
seirl changed the status of T1848: refresh graph dataset export from Open to Work in Progress.
Apr 29 2022, 6:23 PM · Datasets