We now store all the node properties in separate files.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Feed Advanced Search
Advanced Search
Advanced Search
Apr 29 2022
Apr 29 2022
seirl moved T1847: fully automate export of the graph dataset from Backlog to Deployed on the Compressed graph service board.
seirl moved T2431: Document how to export the graph edge dataset from Backlog to Deployed on the Compressed graph service board.
seirl moved T3125: add revision timestamp to the compression timeline from Backlog to Deployed on the Compressed graph service board.
seirl closed T3125: add revision timestamp to the compression timeline, a subtask of T3126: API: add endpoint to find the earliest revision referencing a dir/cnt node, as Resolved.
seirl moved T3768: Read compression input from ORC instead of the edges file from In progress to Deployed on the Compressed graph service board.
seirl moved T2983: graph service: allow loading in memory only one direction of the graph from Implemented to Deployed on the Compressed graph service board.
seirl closed T3021: Investigate why reading the journal of the content table takes so long as Resolved.
Fixed in D7718
seirl closed T2431: Document how to export the graph edge dataset, a subtask of T1847: fully automate export of the graph dataset, as Resolved.
seirl closed T1743: create a nice landing web page for exported dataset, a subtask of T3085: Complete and updated copy of the archive on S3 (objects+graph), as Resolved.
Done, this page https://annex.softwareheritage.org/public/dataset/graph/ now contains a link to the detailed list of datasets: https://forge.softwareheritage.org/D7487
seirl closed T1847: fully automate export of the graph dataset, a subtask of T1848: refresh graph dataset export, as Resolved.
Done!
Fixed in D7487
seirl committed rDDATASETc2c2c21e081c: journalprocessor: re-enable subsharding per partition (authored by seirl).
journalprocessor: re-enable subsharding per partition
rebase
seirl committed rDDATASETdb331f27e87f: docs: Document how to export subdatasets and document/publish datasets (authored by seirl).
docs: Document how to export subdatasets and document/publish datasets
Apr 28 2022
Apr 28 2022
seirl requested review of D7711: docs: Document how to export subdatasets and document/publish datasets.
Merged as 8a44a63f8f1bb95d2589e0d1c37318ee3edcf249
docs: document graph dataset export
Merged as d353e6dba0b01ebf6be569a6d3d94ce65e2b63e2
seirl committed rDGRPH6e276d182043: LabelMapBuilder: use writeLongGamma() when writing offsets to avoid int… (authored by seirl).
LabelMapBuilder: use writeLongGamma() when writing offsets to avoid int…
Apr 27 2022
Apr 27 2022
seirl added a comment to T3021: Investigate why reading the journal of the content table takes so long.
seirl reopened T3021: Investigate why reading the journal of the content table takes so long as "Open".
seirl closed T3021: Investigate why reading the journal of the content table takes so long as Resolved.
No longer happens with a more recent stack
Apr 26 2022
Apr 26 2022
docs: update Databricks tutorial
docs: update Athena tutorial
docs: remove PostgreSQL local setup
Apr 25 2022
Apr 25 2022
seirl committed rDGRPH2958ffcac141: Compression: only allocate up to 90% of physical memory with -Xmx to avoid OOMs (authored by seirl).
Compression: only allocate up to 90% of physical memory with -Xmx to avoid OOMs
Apr 22 2022
Apr 22 2022
seirl committed rDGRPH9d7752e31572: LabelMapBuilder: remove inefficient sorting algorithms (authored by seirl).
LabelMapBuilder: remove inefficient sorting algorithms
seirl committed rDGRPHdc9fcf70b077: tests/generate_dataset.py: fully overwrite old generated dataset (authored by seirl).
tests/generate_dataset.py: fully overwrite old generated dataset
seirl committed rDGRPHf789d879a76b: LabelMapBuilder: compute labels in both directions (authored by seirl).
LabelMapBuilder: compute labels in both directions
mvn: upgrade spotless
Apr 19 2022
Apr 19 2022
Apr 14 2022
Apr 14 2022
seirl committed rDDATASET9f342d9994aa: relational exports: add ID field to origin table (authored by seirl).
relational exports: add ID field to origin table
Apr 12 2022
Apr 12 2022
seirl committed rDDATASET075b3c3068fe: journalprocessor: save final offsets to a text file (authored by seirl).
journalprocessor: save final offsets to a text file
Apr 1 2022
Apr 1 2022
seirl committed rDDATASETa1dd91894055: Docs: update dataset list with recent datasets (authored by seirl).
Docs: update dataset list with recent datasets
Thousands separators, document date inconsistency
Mar 31 2022
Mar 31 2022
Mar 30 2022
Mar 30 2022
config: add max value for batch_size
seirl committed rDGRPH1b8316e879c1: LabelMapBuilder: implementation using quicksort + heap sort, more efficient… (authored by seirl).
LabelMapBuilder: implementation using quicksort + heap sort, more efficient…
seirl committed rDGRPHfcbf62c74989: test dataset: fix file/directory permissions (authored by seirl).
test dataset: fix file/directory permissions
seirl committed rDGRPH5458fc565d26: Add implementation of a parallel quick sort for 3 zipped long arrays (authored by seirl).
Add implementation of a parallel quick sort for 3 zipped long arrays
Going to merge this without review, it's more of a research thing at this point
Mar 29 2022
Mar 29 2022
seirl committed rDGRPH352d27f4b3f6: Add graph properties compressed from the ORC dataset (authored by seirl).
Add graph properties compressed from the ORC dataset
seirl committed rDGRPH4187f8bd68bf: compression: add --batch-size to ScatteredArcsASCIIGraph (authored by seirl).
compression: add --batch-size to ScatteredArcsASCIIGraph
seirl committed rDGRPH3563007e9ade: Ignore typing in generate_dataset.py to avoid hard dependency to swh-dataset (authored by seirl).
Ignore typing in generate_dataset.py to avoid hard dependency to swh-dataset
- Ignore typing in generate_dataset.py to avoid hard dependency to swh-dataset
- Ignore typing in generate_dataset.py to avoid hard dependency to swh-dataset
Rebase, fix writenodeproperties enum number
seirl committed rDDATASET31081e4121e8: Exporters: add option to write in a deterministic location (authored by seirl).
Exporters: add option to write in a deterministic location
Rebase
Mar 18 2022
Mar 18 2022
Mar 16 2022
Mar 16 2022
rebase + fix review
Mar 8 2022
Mar 8 2022
Feb 23 2022
Feb 23 2022
Feb 21 2022
Feb 21 2022
seirl moved T3831: Flaky test in swh-graph from In progress to Deployed on the Compressed graph service board.
seirl moved T2981: Graph API: add a (node type) result filters from In progress to Deployed on the Compressed graph service board.
seirl moved T2647: add LLP support to graph compression pipeline from In progress to Deployed on the Compressed graph service board.
seirl moved T3739: swh-graph: Remove SWHID -> Node ID mapping, use MPH instead from Implemented to Deployed on the Compressed graph service board.
seirl moved T3740: swh-graph: Translate node IDs on the Java side, not Python side from Implemented to Deployed on the Compressed graph service board.
seirl moved T3161: graph service: add anti-DoS limit on the number of edges traversed from Implemented to Deployed on the Compressed graph service board.
Feb 9 2022
Feb 9 2022
Feb 7 2022
Feb 7 2022
Feb 3 2022
Feb 3 2022
Remove unused/buggy TopologicalSort
Feb 2 2022
Feb 2 2022
seirl committed rDGRPHb148c50fca1d: Reorganize compression-related classes in compress. package (authored by seirl).
Reorganize compression-related classes in compress. package
seirl committed rDGRPH2d9529b20d4e: Add graph dataset reading classes (orc+edges) (authored by seirl).
Add graph dataset reading classes (orc+edges)