Page MenuHomeSoftware Heritage
Feed Advanced Search

Feb 2 2022

seirl updated the diff for D7021: Add graph dataset reading classes (orc+edges).

Rebase

Feb 2 2022, 4:56 PM
seirl updated the diff for D7021: Add graph dataset reading classes (orc+edges).

Update from review comments

Feb 2 2022, 4:55 PM

Feb 1 2022

seirl updated the task description for T3904: Integrate the data model chapter of seirl's thesis in the SWH documentation.
Feb 1 2022, 5:02 PM · Documentation
seirl triaged T3904: Integrate the data model chapter of seirl's thesis in the SWH documentation as Normal priority.
Feb 1 2022, 4:51 PM · Documentation

Jan 27 2022

seirl updated the diff for D7021: Add graph dataset reading classes (orc+edges).

Fix docstring

Jan 27 2022, 3:49 PM
seirl updated the diff for D7021: Add graph dataset reading classes (orc+edges).

Remove dead code

Jan 27 2022, 3:44 PM

Jan 26 2022

seirl updated the summary of D7021: Add graph dataset reading classes (orc+edges).
Jan 26 2022, 7:35 PM
seirl requested review of D7021: Add graph dataset reading classes (orc+edges).
Jan 26 2022, 7:21 PM
seirl closed D7038: ORC exporter: use ZST compression.
Jan 26 2022, 7:17 PM
seirl committed rDDATASET18c612bb9865: ORC exporter: use ZST compression (authored by seirl).
ORC exporter: use ZST compression
Jan 26 2022, 7:17 PM

Jan 25 2022

seirl requested review of D7038: ORC exporter: use ZST compression.
Jan 25 2022, 7:54 PM
seirl closed D7010: Add a command to generate a subdataset from a list of SWHIDs using S3.
Jan 25 2022, 7:48 PM
seirl committed rDDATASET027235d6d46d: Add a command to generate a subdataset from a list of SWHIDs using S3 (authored by seirl).
Add a command to generate a subdataset from a list of SWHIDs using S3
Jan 25 2022, 7:48 PM
seirl updated the diff for D7010: Add a command to generate a subdataset from a list of SWHIDs using S3.

Add special case for revisions, compress with ZST

Jan 25 2022, 7:48 PM

Jan 24 2022

seirl created T3885: Filter rows of size >32MB from dataset export.
Jan 24 2022, 9:18 PM · Datasets
seirl updated the diff for D7010: Add a command to generate a subdataset from a list of SWHIDs using S3.

database -> database to create

Jan 24 2022, 5:33 PM
seirl updated the diff for D7010: Add a command to generate a subdataset from a list of SWHIDs using S3.

Expand command description

Jan 24 2022, 5:32 PM

Jan 21 2022

seirl added inline comments to D7010: Add a command to generate a subdataset from a list of SWHIDs using S3.
Jan 21 2022, 4:36 PM
seirl requested review of D7010: Add a command to generate a subdataset from a list of SWHIDs using S3.
Jan 21 2022, 2:29 PM

Jan 20 2022

seirl accepted D6773: journalprocessor: Reuse the Kafka key instead of computing a new one.
Jan 20 2022, 4:32 PM

Jan 18 2022

seirl closed T2981: Graph API: add a (node type) result filters as Resolved by committing rDGRPH294128e0f96e: Use AllowedNodesTest to implement return type filtering.
Jan 18 2022, 1:26 PM · Compressed graph service
seirl closed D6954: Use AllowedNodesTest to implement return type filtering.
Jan 18 2022, 1:26 PM
seirl committed rDGRPH294128e0f96e: Use AllowedNodesTest to implement return type filtering (authored by seirl).
Use AllowedNodesTest to implement return type filtering
Jan 18 2022, 1:26 PM

Jan 17 2022

seirl updated the diff for D6954: Use AllowedNodesTest to implement return type filtering.

Rebase

Jan 17 2022, 4:35 PM
seirl closed T2983: graph service: allow loading in memory only one direction of the graph as Resolved.
Jan 17 2022, 4:33 PM · Compressed graph service
seirl closed D6953: Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph.
Jan 17 2022, 4:32 PM
seirl committed rDGRPH83fcf6bb8156: Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph (authored by seirl).
Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph
Jan 17 2022, 4:32 PM
seirl committed rDGRPHbb9b5fc59d8d: SwhBidirectionalGraph: add UML diagram of class hierarchy (authored by seirl).
SwhBidirectionalGraph: add UML diagram of class hierarchy
Jan 17 2022, 4:31 PM
seirl closed T3302: Write docstrings for each method in swh/graph/backend.py as Resolved.

Since D6676 the specialized methods of swh/graph/backend.py have all been removed and replaced by a generic proxy layer that calls all the methods in a completely transparent fashion, so this specific issue appears to be obsolete now.

Jan 17 2022, 4:09 PM · Compressed graph service
seirl added a comment to D6953: Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph.

Right, this is what T3855 will cover, but there is also a dependency on T1971 to figure out where this will go exactly (package summary vs sphinx)

Jan 17 2022, 3:58 PM
seirl added a comment to D6953: Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph.

I filed T3855 to document the architecture of the java code of swh-graph so that it stays on our radar.

Jan 17 2022, 3:41 PM
seirl triaged T3855: Document the architecture of the Java code in swh-graph as Normal priority.
Jan 17 2022, 3:40 PM · Compressed graph service
seirl added a comment to D6953: Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph.

I agree in general, but I disagree for the specifics of this diff. This is a description of the design of a specific component, which could be copied in an architecture presentation if it makes sense, but is certainly not an architecture presentation in itself.

Jan 17 2022, 3:39 PM
seirl updated the diff for D6953: Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph.

SwhBidirectionalGraph: add UML diagram of class hierarchy

Jan 17 2022, 1:46 PM
seirl added a comment to D6953: Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph.

@vlorentz this is all written in the docstrings of the classes, which will be added to swh-docs once https://forge.softwareheritage.org/T1971 is implemented (which is going to be easier to do thanks to this diff :-))

Jan 17 2022, 1:42 PM

Jan 15 2022

seirl requested review of D6954: Use AllowedNodesTest to implement return type filtering.
Jan 15 2022, 12:05 AM
seirl changed the status of T2981: Graph API: add a (node type) result filters from Open to Work in Progress.
Jan 15 2022, 12:04 AM · Compressed graph service
seirl added a revision to T2981: Graph API: add a (node type) result filters: D6954: Use AllowedNodesTest to implement return type filtering.
Jan 15 2022, 12:03 AM · Compressed graph service

Jan 14 2022

seirl renamed T3832: Investigate Luigi as an ETL framework for the compression pipeline from Investigate Luigi as an ETR framework for the compression pipeline to Investigate Luigi as an ETL framework for the compression pipeline.
Jan 14 2022, 11:33 PM · Compressed graph service
seirl added a revision to T2983: graph service: allow loading in memory only one direction of the graph: D6953: Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph.
Jan 14 2022, 11:29 PM · Compressed graph service
seirl updated the summary of D6953: Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph.
Jan 14 2022, 11:29 PM
seirl changed the status of T2983: graph service: allow loading in memory only one direction of the graph from Open to Work in Progress.
Jan 14 2022, 11:29 PM · Compressed graph service
seirl updated the diff for D6953: Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph.

Fix more buggy Graph -> SwhBidirectionalGraph

Jan 14 2022, 11:28 PM
seirl updated the diff for D6953: Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph.

Remove Graph -> SwhBidirectionalGraph strings

Jan 14 2022, 11:24 PM
seirl requested review of D6953: Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph.
Jan 14 2022, 11:20 PM

Jan 13 2022

seirl committed rDGRPH64f35108dddd: Make random_walk tests deterministic (authored by seirl).
Make random_walk tests deterministic
Jan 13 2022, 4:08 PM
seirl closed T3831: Flaky test in swh-graph as Resolved by committing rDGRPHc2074adc3d2c: Increase retries for random walks from 5 to 10.
Jan 13 2022, 4:08 PM · Compressed graph service
seirl closed D6893: Increase retries for random walks from 5 to 10.
Jan 13 2022, 4:08 PM
seirl committed rDGRPHc2074adc3d2c: Increase retries for random walks from 5 to 10 (authored by seirl).
Increase retries for random walks from 5 to 10
Jan 13 2022, 4:08 PM
seirl added a comment to D6893: Increase retries for random walks from 5 to 10.

Here, I changed it so that all the paths go to the correct node.

Jan 13 2022, 3:52 PM
seirl updated the diff for D6893: Increase retries for random walks from 5 to 10.

Make random_walk tests deterministic

Jan 13 2022, 3:52 PM

Jan 12 2022

seirl closed T3161: graph service: add anti-DoS limit on the number of edges traversed, a subtask of T2220: swh-graph in production, as Resolved.
Jan 12 2022, 5:01 PM · Roadmap 2022, meta-task, Roadmap 2021, Compressed graph service
seirl closed T3161: graph service: add anti-DoS limit on the number of edges traversed as Resolved.
Jan 12 2022, 5:01 PM · Compressed graph service
seirl closed D5675: update the graph rpc api doc.
Jan 12 2022, 2:52 PM
seirl committed rDGRPH1a1bb59c1b8c: Document max_edges and return_types query parameters in RPC API (authored by seirl).
Document max_edges and return_types query parameters in RPC API
Jan 12 2022, 2:52 PM
seirl accepted D6919: api/graph: Handle query parameters that might be passed in graph_query.
Jan 12 2022, 2:50 PM
seirl updated the diff for D5675: update the graph rpc api doc.

Rebase

Jan 12 2022, 2:45 PM
seirl requested changes to D6919: api/graph: Handle query parameters that might be passed in graph_query.

Minor comment to avoid doing url parsing manually, otherwise LGTM

Jan 12 2022, 2:17 PM
seirl accepted D6914: api/graph: Implement anti-DoS policies for graph visits.
Jan 12 2022, 2:08 PM
seirl closed D6892: Add max_edges argument to all the endpoints.
Jan 12 2022, 2:06 PM
seirl committed rDGRPH32d6b0ccf3b1: Add max_edges argument to all the endpoints (authored by seirl).
Add max_edges argument to all the endpoints
Jan 12 2022, 2:06 PM

Jan 11 2022

seirl requested changes to D6914: api/graph: Implement anti-DoS policies for graph visits.

I have just one security consideration, otherwise LGTM. Thanks!

Jan 11 2022, 3:15 PM

Jan 10 2022

seirl added a comment to D6892: Add max_edges argument to all the endpoints.

The coverage report does not seem to agree though ;)

Jan 10 2022, 7:50 PM
seirl added a comment to T3831: Flaky test in swh-graph.

No, we want to check that random_walk can reach its actual destination.

Jan 10 2022, 2:54 PM · Compressed graph service

Jan 7 2022

seirl triaged T3836: Define and implement an anti-DoS policy for graph visits using the max_edges parameter as High priority.
Jan 7 2022, 5:12 PM · Web app
seirl changed the status of T3831: Flaky test in swh-graph from Open to Work in Progress.
Jan 7 2022, 4:37 PM · Compressed graph service
seirl added a comment to T3831: Flaky test in swh-graph.

I made a temporary fix in D6893, it doesn't solve the underlying issue but greatly decreases the probability of it happening. I'm not quite sure what would be a proper test for this endpoint, but this is at least enough to fix this issue in particular.

Jan 7 2022, 4:36 PM · Compressed graph service
seirl requested review of D5675: update the graph rpc api doc.
Jan 7 2022, 4:35 PM
seirl requested review of D6893: Increase retries for random walks from 5 to 10.
Jan 7 2022, 4:28 PM
seirl added a revision to T3831: Flaky test in swh-graph: D6893: Increase retries for random walks from 5 to 10.
Jan 7 2022, 4:26 PM · Compressed graph service
seirl requested review of D6892: Add max_edges argument to all the endpoints.
Jan 7 2022, 4:02 PM

Jan 4 2022

seirl triaged T3832: Investigate Luigi as an ETL framework for the compression pipeline as Normal priority.
Jan 4 2022, 2:17 PM · Compressed graph service

Dec 14 2021

seirl claimed T3161: graph service: add anti-DoS limit on the number of edges traversed.
Dec 14 2021, 1:32 PM · Compressed graph service

Dec 10 2021

seirl closed T2647: add LLP support to graph compression pipeline as Resolved by committing rDGRPH00112952614e: Add LLP compression to the WebGraph pipeline.
Dec 10 2021, 3:00 PM · Compressed graph service
seirl closed D4821: Add LLP compression to the WebGraph pipeline.
Dec 10 2021, 3:00 PM
seirl committed rDGRPH00112952614e: Add LLP compression to the WebGraph pipeline (authored by seirl).
Add LLP compression to the WebGraph pipeline
Dec 10 2021, 3:00 PM
seirl triaged T3794: Document swh-graph compression pipeline options as Normal priority.
Dec 10 2021, 3:00 PM · Compressed graph service
seirl triaged T3793: Add copyright notices to all swh-graph Java files as Low priority.
Dec 10 2021, 3:00 PM · Compressed graph service
seirl added a comment to D4821: Add LLP compression to the WebGraph pipeline.

Yes, this is implicitly all tested under the compression pipeline test.

Dec 10 2021, 2:59 PM

Dec 6 2021

seirl triaged T3768: Read compression input from ORC instead of the edges file as High priority.
Dec 6 2021, 11:05 AM · Compressed graph service
seirl created T3768: Read compression input from ORC instead of the edges file.
Dec 6 2021, 11:05 AM · Compressed graph service

Dec 4 2021

seirl updated the diff for D4821: Add LLP compression to the WebGraph pipeline.

Add simplify step, fix various review comments

Dec 4 2021, 2:07 AM
seirl committed rDGRPH58de681bd729: Update Maven dependencies (authored by seirl).
Update Maven dependencies
Dec 4 2021, 2:05 AM
seirl closed D6699: Stop writing swhid2node.bin maps.
Dec 4 2021, 1:30 AM
seirl committed rDGRPHd0dbfda9a775: Stop writing swhid2node.bin maps (authored by seirl).
Stop writing swhid2node.bin maps
Dec 4 2021, 1:30 AM

Dec 1 2021

seirl closed T3739: swh-graph: Remove SWHID -> Node ID mapping, use MPH instead as Resolved.
Dec 1 2021, 4:24 PM · Compressed graph service
seirl added a comment to D6699: Stop writing swhid2node.bin maps.

why the need for SortOutputHandler attributes to become final?

Dec 1 2021, 3:00 PM

Nov 26 2021

seirl requested review of D6699: Stop writing swhid2node.bin maps.
Nov 26 2021, 5:36 PM
seirl closed T3740: swh-graph: Translate node IDs on the Java side, not Python side, a subtask of T3739: swh-graph: Remove SWHID -> Node ID mapping, use MPH instead, as Resolved.
Nov 26 2021, 5:33 PM · Compressed graph service
seirl closed T3740: swh-graph: Translate node IDs on the Java side, not Python side as Resolved.
Nov 26 2021, 5:33 PM · Compressed graph service
seirl closed D6676: Move SWHID<->node ID conversion in the Java backend.
Nov 26 2021, 5:05 PM
seirl committed rDGRPH0b33cff0d228: Move SWHID<->node ID conversion in the Java backend (authored by seirl).
Move SWHID<->node ID conversion in the Java backend
Nov 26 2021, 5:05 PM
seirl updated the diff for D6676: Move SWHID<->node ID conversion in the Java backend.

Fix src/dst inversion, add regression test

Nov 26 2021, 4:33 PM
seirl added inline comments to D6676: Move SWHID<->node ID conversion in the Java backend.
Nov 26 2021, 1:47 PM

Nov 25 2021

seirl committed rDGRPH32bab89d4448: BidirectionalImmutableGraph: implement outdegrees and predecessorBigArray… (authored by seirl).
BidirectionalImmutableGraph: implement outdegrees and predecessorBigArray…
Nov 25 2021, 5:00 PM
seirl committed rDGRPH5f5ae5dcc104: Add mvn/jvm.config to fix spotless not working with OpenJDK 16+ (authored by seirl).
Add mvn/jvm.config to fix spotless not working with OpenJDK 16+
Nov 25 2021, 4:04 PM
seirl committed rDGRPHbe6c986a5238: Move bidirectional graph logic into a separate ImmutableBidirectionalGraph class (authored by seirl).
Move bidirectional graph logic into a separate ImmutableBidirectionalGraph class
Nov 25 2021, 4:04 PM
seirl committed rDGRPH3cbcf625aa24: SubdatasetSizeFunction: collect more statistics (authored by seirl).
SubdatasetSizeFunction: collect more statistics
Nov 25 2021, 4:03 PM
seirl updated the task description for T3579: Meta-task: upgrade infrastructure to Debian Bullseye.
Nov 25 2021, 12:11 PM · System administration (Component upgrades)
seirl updated the task description for T3579: Meta-task: upgrade infrastructure to Debian Bullseye.
Nov 25 2021, 12:10 PM · System administration (Component upgrades)