Page MenuHomeSoftware Heritage
Feed Advanced Search

Apr 29 2022

seirl moved T1847: fully automate export of the graph dataset from Backlog to Deployed on the Compressed graph service board.
Apr 29 2022, 6:22 PM · Compressed graph service, Datasets
seirl moved T2431: Document how to export the graph edge dataset from Backlog to Deployed on the Compressed graph service board.
Apr 29 2022, 6:22 PM · Documentation, Compressed graph service, Datasets
seirl moved T3125: add revision timestamp to the compression timeline from Backlog to Deployed on the Compressed graph service board.
Apr 29 2022, 6:22 PM · Compressed graph service
seirl closed T3125: add revision timestamp to the compression timeline as Resolved.

We now store all the node properties in separate files.

Apr 29 2022, 6:22 PM · Compressed graph service
seirl closed T3125: add revision timestamp to the compression timeline, a subtask of T3126: API: add endpoint to find the earliest revision referencing a dir/cnt node, as Resolved.
Apr 29 2022, 6:22 PM · Compressed graph service
seirl moved T3768: Read compression input from ORC instead of the edges file from In progress to Deployed on the Compressed graph service board.
Apr 29 2022, 6:21 PM · Compressed graph service
seirl moved T2983: graph service: allow loading in memory only one direction of the graph from Implemented to Deployed on the Compressed graph service board.
Apr 29 2022, 6:21 PM · Compressed graph service
seirl closed T3021: Investigate why reading the journal of the content table takes so long as Resolved.

Fixed in D7718

Apr 29 2022, 6:20 PM · Journal, Datasets
seirl closed T2431: Document how to export the graph edge dataset, a subtask of T1847: fully automate export of the graph dataset, as Resolved.
Apr 29 2022, 6:15 PM · Compressed graph service, Datasets
seirl closed T2431: Document how to export the graph edge dataset as Resolved.

Done here: D7693 and here: D7711

Apr 29 2022, 6:15 PM · Documentation, Compressed graph service, Datasets
seirl closed T1743: create a nice landing web page for exported dataset, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), as Resolved.
Apr 29 2022, 6:14 PM · Roadmap 2022, meta-task, Roadmap 2021, System administration, Object storage
seirl closed T1743: create a nice landing web page for exported dataset as Resolved.
Apr 29 2022, 6:14 PM · Datasets
seirl added a comment to T1743: create a nice landing web page for exported dataset.

Done, this page https://annex.softwareheritage.org/public/dataset/graph/ now contains a link to the detailed list of datasets: https://forge.softwareheritage.org/D7487

Apr 29 2022, 6:14 PM · Datasets
seirl closed T1847: fully automate export of the graph dataset, a subtask of T1848: refresh graph dataset export, as Resolved.
Apr 29 2022, 5:57 PM · Datasets
seirl closed T1847: fully automate export of the graph dataset as Resolved.

Done!

Apr 29 2022, 5:57 PM · Compressed graph service, Datasets
seirl closed T3329: document ORC format dataset availability as Resolved.

Fixed in D7487

Apr 29 2022, 5:56 PM · Datasets
seirl closed D7718: journalprocessor: re-enable subsharding per partition.
Apr 29 2022, 5:49 PM
seirl committed rDDATASETc2c2c21e081c: journalprocessor: re-enable subsharding per partition (authored by seirl).
journalprocessor: re-enable subsharding per partition
Apr 29 2022, 5:49 PM
seirl updated the diff for D7718: journalprocessor: re-enable subsharding per partition.

rebase

Apr 29 2022, 5:48 PM
seirl requested review of D7718: journalprocessor: re-enable subsharding per partition.
Apr 29 2022, 5:40 PM
seirl closed D7711: docs: Document how to export subdatasets and document/publish datasets.
Apr 29 2022, 5:35 PM
seirl committed rDDATASETdb331f27e87f: docs: Document how to export subdatasets and document/publish datasets (authored by seirl).
docs: Document how to export subdatasets and document/publish datasets
Apr 29 2022, 5:35 PM
seirl created P1357 (An Untitled Masterwork).
Apr 29 2022, 3:11 PM

Apr 28 2022

seirl requested review of D7711: docs: Document how to export subdatasets and document/publish datasets.
Apr 28 2022, 5:41 PM
seirl closed D7693: docs: document graph dataset export.

Merged as 8a44a63f8f1bb95d2589e0d1c37318ee3edcf249

Apr 28 2022, 5:38 PM
seirl committed rDDATASET8a44a63f8f1b: docs: document graph dataset export (authored by seirl).
docs: document graph dataset export
Apr 28 2022, 5:38 PM
seirl closed D7686: docs: remove PostgreSQL local setup.

Merged as d353e6dba0b01ebf6be569a6d3d94ce65e2b63e2

Apr 28 2022, 4:19 PM
seirl committed rDGRPH6e276d182043: LabelMapBuilder: use writeLongGamma() when writing offsets to avoid int… (authored by seirl).
LabelMapBuilder: use writeLongGamma() when writing offsets to avoid int…
Apr 28 2022, 12:20 AM

Apr 27 2022

seirl added a comment to T3021: Investigate why reading the journal of the content table takes so long.

Apr 27 2022, 2:58 PM · Journal, Datasets
seirl reopened T3021: Investigate why reading the journal of the content table takes so long as "Open".
Apr 27 2022, 2:57 PM · Journal, Datasets
seirl closed T3021: Investigate why reading the journal of the content table takes so long as Resolved.

No longer happens with a more recent stack

Apr 27 2022, 10:12 AM · Journal, Datasets

Apr 26 2022

seirl requested review of D7693: docs: document graph dataset export.
Apr 26 2022, 7:43 PM
seirl committed rDDATASET755c903bd3eb: docs: update Databricks tutorial (authored by seirl).
docs: update Databricks tutorial
Apr 26 2022, 4:59 PM
seirl committed rDDATASET17995b90a296: docs: update Athena tutorial (authored by seirl).
docs: update Athena tutorial
Apr 26 2022, 4:59 PM
seirl committed rDDATASETd353e6dba0b0: docs: remove PostgreSQL local setup (authored by seirl).
docs: remove PostgreSQL local setup
Apr 26 2022, 4:48 PM
seirl requested review of D7686: docs: remove PostgreSQL local setup.
Apr 26 2022, 4:10 PM

Apr 25 2022

seirl committed rDGRPH2958ffcac141: Compression: only allocate up to 90% of physical memory with -Xmx to avoid OOMs (authored by seirl).
Compression: only allocate up to 90% of physical memory with -Xmx to avoid OOMs
Apr 25 2022, 6:08 PM
seirl created P1349 (An Untitled Masterwork).
Apr 25 2022, 2:23 PM

Apr 22 2022

seirl committed rDGRPH9d7752e31572: LabelMapBuilder: remove inefficient sorting algorithms (authored by seirl).
LabelMapBuilder: remove inefficient sorting algorithms
Apr 22 2022, 8:50 PM
seirl committed rDGRPHdc9fcf70b077: tests/generate_dataset.py: fully overwrite old generated dataset (authored by seirl).
tests/generate_dataset.py: fully overwrite old generated dataset
Apr 22 2022, 8:10 PM
seirl committed rDGRPHf789d879a76b: LabelMapBuilder: compute labels in both directions (authored by seirl).
LabelMapBuilder: compute labels in both directions
Apr 22 2022, 8:10 PM
seirl committed rDGRPH829c27963a5b: mvn: upgrade spotless (authored by seirl).
mvn: upgrade spotless
Apr 22 2022, 8:10 PM

Apr 19 2022

seirl accepted D7593: Add labelled graph getter..
Apr 19 2022, 11:25 AM

Apr 14 2022

seirl closed D7585: relational exports: add ID field to origin table.
Apr 14 2022, 5:53 PM
seirl committed rDDATASET9f342d9994aa: relational exports: add ID field to origin table (authored by seirl).
relational exports: add ID field to origin table
Apr 14 2022, 5:53 PM
seirl requested review of D7585: relational exports: add ID field to origin table.
Apr 14 2022, 4:37 PM

Apr 12 2022

seirl created P1339 (An Untitled Masterwork).
Apr 12 2022, 5:33 PM
seirl created P1338 (An Untitled Masterwork).
Apr 12 2022, 5:27 PM
seirl closed D7558: journalprocessor: save final offsets to a text file.
Apr 12 2022, 5:16 PM
seirl committed rDDATASET075b3c3068fe: journalprocessor: save final offsets to a text file (authored by seirl).
journalprocessor: save final offsets to a text file
Apr 12 2022, 5:16 PM
seirl requested review of D7558: journalprocessor: save final offsets to a text file.
Apr 12 2022, 4:32 PM

Apr 1 2022

seirl closed D7487: Docs: update dataset list with recent datasets.
Apr 1 2022, 8:06 PM
seirl committed rDDATASETa1dd91894055: Docs: update dataset list with recent datasets (authored by seirl).
Docs: update dataset list with recent datasets
Apr 1 2022, 8:06 PM
seirl updated the diff for D7487: Docs: update dataset list with recent datasets.

Thousands separators, document date inconsistency

Apr 1 2022, 8:05 PM
seirl requested review of D7487: Docs: update dataset list with recent datasets.
Apr 1 2022, 5:03 PM
seirl created P1327 (An Untitled Masterwork).
Apr 1 2022, 4:45 PM

Mar 31 2022

seirl accepted D7476: Keep each ORC table in a dedicated directory.
Mar 31 2022, 2:32 PM

Mar 30 2022

seirl committed rDGRPH6788bc25064c: config: add max value for batch_size (authored by seirl).
config: add max value for batch_size
Mar 30 2022, 6:36 PM
seirl committed rDGRPH1b8316e879c1: LabelMapBuilder: implementation using quicksort + heap sort, more efficient… (authored by seirl).
LabelMapBuilder: implementation using quicksort + heap sort, more efficient…
Mar 30 2022, 6:36 PM
seirl committed rDGRPHfcbf62c74989: test dataset: fix file/directory permissions (authored by seirl).
test dataset: fix file/directory permissions
Mar 30 2022, 6:36 PM
seirl closed D7475: LabelMapBuilder: implementation using java quicksort + heap sort.
Mar 30 2022, 6:36 PM
seirl committed rDGRPH5458fc565d26: Add implementation of a parallel quick sort for 3 zipped long arrays (authored by seirl).
Add implementation of a parallel quick sort for 3 zipped long arrays
Mar 30 2022, 6:36 PM
seirl added a comment to D7475: LabelMapBuilder: implementation using java quicksort + heap sort.

Going to merge this without review, it's more of a research thing at this point

Mar 30 2022, 6:34 PM
seirl requested review of D7475: LabelMapBuilder: implementation using java quicksort + heap sort.
Mar 30 2022, 6:21 PM
seirl closed T3768: Read compression input from ORC instead of the edges file as Resolved.
Mar 30 2022, 4:49 PM · Compressed graph service

Mar 29 2022

seirl committed rDGRPH352d27f4b3f6: Add graph properties compressed from the ORC dataset (authored by seirl).
Add graph properties compressed from the ORC dataset
Mar 29 2022, 4:53 PM
seirl committed rDGRPH4187f8bd68bf: compression: add --batch-size to ScatteredArcsASCIIGraph (authored by seirl).
compression: add --batch-size to ScatteredArcsASCIIGraph
Mar 29 2022, 4:53 PM
seirl closed D7331: Add graph properties compressed from the ORC dataset.
Mar 29 2022, 4:53 PM
seirl committed rDGRPH3563007e9ade: Ignore typing in generate_dataset.py to avoid hard dependency to swh-dataset (authored by seirl).
Ignore typing in generate_dataset.py to avoid hard dependency to swh-dataset
Mar 29 2022, 4:53 PM
seirl updated the diff for D7331: Add graph properties compressed from the ORC dataset.
  • Ignore typing in generate_dataset.py to avoid hard dependency to swh-dataset
Mar 29 2022, 4:44 PM
seirl updated the diff for D7331: Add graph properties compressed from the ORC dataset.
  • Ignore typing in generate_dataset.py to avoid hard dependency to swh-dataset
Mar 29 2022, 4:35 PM
seirl triaged T4115: Some unknown SWHID errors crash the graph server as Normal priority.
Mar 29 2022, 4:05 PM · Compressed graph service
seirl updated the diff for D7331: Add graph properties compressed from the ORC dataset.

Rebase, fix writenodeproperties enum number

Mar 29 2022, 3:39 PM
seirl closed D7322: Exporters: add option to write in a deterministic location.
Mar 29 2022, 3:35 PM
seirl committed rDDATASET31081e4121e8: Exporters: add option to write in a deterministic location (authored by seirl).
Exporters: add option to write in a deterministic location
Mar 29 2022, 3:35 PM
seirl updated the diff for D7322: Exporters: add option to write in a deterministic location.

Rebase

Mar 29 2022, 3:32 PM
seirl accepted D7380: Update JournalClientOffsetRanges for swh.journal 0.9.
Mar 29 2022, 3:20 PM
seirl triaged T4114: Add logging by default to swh-graph compression as Normal priority.
Mar 29 2022, 3:16 PM · Compressed graph service
seirl triaged T4113: Review border case of empty response for `visit_nodes` as Normal priority.
Mar 29 2022, 3:16 PM · Compressed graph service
seirl accepted D7379: Encode TimestampWithTimezone as (sec, usec, offset) in ORC file.
Mar 29 2022, 3:15 PM

Mar 18 2022

seirl requested review of D7331: Add graph properties compressed from the ORC dataset.
Mar 18 2022, 3:17 PM

Mar 16 2022

seirl updated the diff for D7322: Exporters: add option to write in a deterministic location.

rebase + fix review

Mar 16 2022, 7:52 PM
seirl added inline comments to D7322: Exporters: add option to write in a deterministic location.
Mar 16 2022, 2:00 PM

Mar 8 2022

seirl requested review of D7322: Exporters: add option to write in a deterministic location.
Mar 8 2022, 11:33 PM

Feb 23 2022

seirl created P1300 (An Untitled Masterwork).
Feb 23 2022, 4:32 PM

Feb 21 2022

seirl moved T3831: Flaky test in swh-graph from In progress to Deployed on the Compressed graph service board.
Feb 21 2022, 12:56 PM · Compressed graph service
seirl moved T2981: Graph API: add a (node type) result filters from In progress to Deployed on the Compressed graph service board.
Feb 21 2022, 12:56 PM · Compressed graph service
seirl moved T2647: add LLP support to graph compression pipeline from In progress to Deployed on the Compressed graph service board.
Feb 21 2022, 12:56 PM · Compressed graph service
seirl moved T3739: swh-graph: Remove SWHID -> Node ID mapping, use MPH instead from Implemented to Deployed on the Compressed graph service board.
Feb 21 2022, 12:56 PM · Compressed graph service
seirl moved T3740: swh-graph: Translate node IDs on the Java side, not Python side from Implemented to Deployed on the Compressed graph service board.
Feb 21 2022, 12:56 PM · Compressed graph service
seirl moved T3161: graph service: add anti-DoS limit on the number of edges traversed from Implemented to Deployed on the Compressed graph service board.
Feb 21 2022, 12:55 PM · Compressed graph service

Feb 9 2022

seirl created P1282 SwhGraph dumped properties.
Feb 9 2022, 4:05 PM
seirl updated the title for P1281 DumpProperties.java from untitled to DumpProperties.java.
Feb 9 2022, 3:58 PM
seirl updated the title for P1281 DumpProperties.java from Command-Line Input to untitled.
Feb 9 2022, 3:58 PM
seirl created P1281 DumpProperties.java.
Feb 9 2022, 3:58 PM

Feb 7 2022

seirl created P1279 DumpProperties.java.
Feb 7 2022, 5:56 PM

Feb 3 2022

seirl committed rDGRPH020cd71ef66d: Remove unused/buggy TopologicalSort (authored by seirl).
Remove unused/buggy TopologicalSort
Feb 3 2022, 3:44 PM

Feb 2 2022

seirl committed rDGRPHb148c50fca1d: Reorganize compression-related classes in compress. package (authored by seirl).
Reorganize compression-related classes in compress. package
Feb 2 2022, 6:17 PM
seirl closed D7021: Add graph dataset reading classes (orc+edges).
Feb 2 2022, 4:56 PM
seirl committed rDGRPH2d9529b20d4e: Add graph dataset reading classes (orc+edges) (authored by seirl).
Add graph dataset reading classes (orc+edges)
Feb 2 2022, 4:56 PM