Depends on D8910.
Details
Details
- Reviewers
ardumont - Group Reviewers
Reviewers - Maniphest Tasks
- T4695: Provide a collaboration graph / dataset
- Commits
- rDGRPHdfd4c1dc3b22: ListOriginContributors: Ignore null author/committer in revisions/releases
Diff Detail
Diff Detail
- Repository
- rDGRPH Compressed graph representation
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Comment Actions
Build has FAILED
Patch application report for D8912 (id=32115)
Could not rebase; Attempt merge onto ec7f568b13...
Updating ec7f568..df18b5b Fast-forward conftest.py | 1 + .../graph/utils/ListOriginContributors.java | 151 +++++++ .../org/softwareheritage/graph/utils/TopoSort.java | 134 ++++++ mypy.ini | 6 + requirements-luigi.txt | 2 + requirements-swh-luigi.txt | 2 +- requirements-swh.txt | 1 + swh/graph/luigi.py | 474 ++++++++++++++++++++- .../dataset/compressed/example-labelled.labelobl | Bin 0 -> 772 bytes .../compressed/example-labelled.labeloffsets | 3 +- .../dataset/compressed/example-labelled.labels | 2 +- .../dataset/compressed/example-labelled.properties | 2 +- .../example-transposed-labelled.labelobl | Bin 0 -> 772 bytes .../example-transposed-labelled.labeloffsets | 3 +- .../compressed/example-transposed-labelled.labels | 3 +- .../example-transposed-labelled.properties | 2 +- .../dataset/compressed/example-transposed.graph | 2 +- .../dataset/compressed/example-transposed.obl | Bin 772 -> 772 bytes .../dataset/compressed/example-transposed.offsets | 3 +- .../compressed/example-transposed.properties | 52 +-- .../dataset/compressed/example.edges.count.txt | 2 +- .../dataset/compressed/example.edges.stats.txt | 8 +- swh/graph/tests/dataset/compressed/example.graph | 2 +- .../tests/dataset/compressed/example.indegree | 5 +- .../dataset/compressed/example.labels.count.txt | 2 +- .../dataset/compressed/example.labels.csv.zst | Bin 115 -> 131 bytes .../compressed/example.labels.fcl.bytearray | Bin 110 -> 128 bytes .../dataset/compressed/example.labels.fcl.pointers | Bin 16 -> 24 bytes .../compressed/example.labels.fcl.properties | 2 +- .../tests/dataset/compressed/example.labels.mph | Bin 1521 -> 1529 bytes swh/graph/tests/dataset/compressed/example.mph | Bin 961 -> 961 bytes .../dataset/compressed/example.node2swhid.bin | Bin 462 -> 528 bytes .../tests/dataset/compressed/example.node2type.map | Bin 353 -> 361 bytes .../dataset/compressed/example.nodes.count.txt | 2 +- .../tests/dataset/compressed/example.nodes.csv.zst | Bin 150 -> 181 bytes .../dataset/compressed/example.nodes.stats.txt | 6 +- swh/graph/tests/dataset/compressed/example.obl | Bin 772 -> 772 bytes swh/graph/tests/dataset/compressed/example.offsets | 4 +- swh/graph/tests/dataset/compressed/example.order | Bin 168 -> 192 bytes .../tests/dataset/compressed/example.outdegree | 4 +- .../tests/dataset/compressed/example.persons.mph | Bin 961 -> 961 bytes .../tests/dataset/compressed/example.properties | 50 +-- .../compressed/example.property.author_id.bin | Bin 84 -> 2112 bytes .../example.property.author_timestamp.bin | Bin 168 -> 4224 bytes .../example.property.author_timestamp_offset.bin | Bin 42 -> 1056 bytes .../compressed/example.property.committer_id.bin | Bin 84 -> 2112 bytes .../example.property.committer_timestamp.bin | Bin 168 -> 4224 bytes ...example.property.committer_timestamp_offset.bin | Bin 42 -> 1056 bytes .../example.property.content.is_skipped.bin | Bin 85 -> 149 bytes .../compressed/example.property.content.length.bin | Bin 168 -> 4224 bytes .../compressed/example.property.message.bin | 2 + .../compressed/example.property.message.offset.bin | Bin 168 -> 4224 bytes .../compressed/example.property.tag_name.bin | 1 + .../example.property.tag_name.offset.bin | Bin 168 -> 4224 bytes swh/graph/tests/dataset/compressed/example.stats | 28 +- .../logs/example-1669888191558-extract_nodes.log | 31 ++ .../compressed/logs/example-1669888192235-mph.log | 15 + .../compressed/logs/example-1669888192705-bv.log | 35 ++ .../compressed/logs/example-1669888198778-bfs.log | 7 + .../logs/example-1669888199039-permute_bfs.log | 23 + .../logs/example-1669888199374-transpose_bfs.log | 19 + .../logs/example-1669888199720-simplify.log | 22 + .../compressed/logs/example-1669888199989-llp.log | 143 +++++++ .../logs/example-1669888200352-permute_llp.log | 23 + .../compressed/logs/example-1669888200692-obl.log | 4 + .../logs/example-1669888200927-compose_orders.log | 4 + .../logs/example-1669888201039-stats.log | 7 + .../logs/example-1669888201272-transpose.log | 19 + .../logs/example-1669888201615-transpose_obl.log | 4 + .../compressed/logs/example-1669888201853-maps.log | 18 + .../logs/example-1669888202131-extract_persons.log | 11 + .../logs/example-1669888202702-mph_persons.log | 15 + .../logs/example-1669888203136-node_properties.log | 36 ++ .../logs/example-1669888203831-mph_labels.log | 26 ++ .../logs/example-1669888204319-fcl_labels.log | 6 + .../logs/example-1669888204581-edge_labels.log | 39 ++ .../logs/example-1669888210521-edge_labels_obl.log | 4 + ...ple-1669888210788-edge_labels_transpose_obl.log | 4 + .../logs/example-1669888211035-clean_tmp.log | 3 + .../dataset/edges/origin/graph-all.edges.csv.zst | Bin 82 -> 109 bytes .../dataset/edges/origin/graph-all.nodes.csv.zst | Bin 64 -> 95 bytes .../dataset/edges/release/graph-all.edges.csv.zst | Bin 56 -> 73 bytes .../dataset/edges/release/graph-all.nodes.csv.zst | Bin 38 -> 42 bytes .../dataset/edges/snapshot/graph-all.edges.csv.zst | Bin 94 -> 128 bytes .../dataset/edges/snapshot/graph-all.nodes.csv.zst | Bin 33 -> 38 bytes swh/graph/tests/dataset/generate_dataset.py | 46 +- swh/graph/tests/dataset/img/example.dot | 13 +- .../tests/dataset/orc/content/content-all.orc | Bin 1240 -> 1226 bytes .../tests/dataset/orc/directory/directory-all.orc | Bin 578 -> 563 bytes .../orc/directory_entry/directory_entry-all.orc | Bin 1126 -> 1115 bytes swh/graph/tests/dataset/orc/origin/origin-all.orc | Bin 817 -> 935 bytes .../dataset/orc/origin_visit/origin_visit-all.orc | Bin 898 -> 924 bytes .../origin_visit_status-all.orc | Bin 1150 -> 1191 bytes .../tests/dataset/orc/release/release-all.orc | Bin 1361 -> 1407 bytes .../tests/dataset/orc/revision/revision-all.orc | Bin 1658 -> 1643 bytes .../revision_extra_headers-all.orc | Bin 253 -> 236 bytes .../orc/revision_history/revision_history-all.orc | Bin 700 -> 685 bytes .../orc/skipped_content/skipped_content-all.orc | Bin 1177 -> 1160 bytes .../tests/dataset/orc/snapshot/snapshot-all.orc | Bin 459 -> 456 bytes .../orc/snapshot_branch/snapshot_branch-all.orc | Bin 865 -> 921 bytes swh/graph/tests/test_cli.py | 4 +- swh/graph/tests/test_grpc.py | 7 +- swh/graph/tests/test_http_client.py | 18 +- swh/graph/tests/test_luigi.py | 4 +- swh/graph/tests/test_origin_contributors.py | 186 ++++++++ swh/graph/tests/test_toposort.py | 67 +++ 106 files changed, 1710 insertions(+), 114 deletions(-) create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/ListOriginContributors.java create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/TopoSort.java create mode 100644 swh/graph/tests/dataset/compressed/example-labelled.labelobl create mode 100644 swh/graph/tests/dataset/compressed/example-transposed-labelled.labelobl create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888191558-extract_nodes.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888192235-mph.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888192705-bv.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888198778-bfs.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888199039-permute_bfs.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888199374-transpose_bfs.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888199720-simplify.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888199989-llp.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888200352-permute_llp.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888200692-obl.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888200927-compose_orders.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888201039-stats.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888201272-transpose.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888201615-transpose_obl.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888201853-maps.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888202131-extract_persons.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888202702-mph_persons.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888203136-node_properties.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888203831-mph_labels.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888204319-fcl_labels.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888204581-edge_labels.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888210521-edge_labels_obl.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888210788-edge_labels_transpose_obl.log create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888211035-clean_tmp.log create mode 100644 swh/graph/tests/test_origin_contributors.py create mode 100644 swh/graph/tests/test_toposort.py
Changes applied before test
commit df18b5babe08fce8a7b3ea1f022062009c34b8a6 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Dec 1 11:35:43 2022 +0100 ListOriginContributors: Ignore null author/committer in revisions/releases commit 67abec7533eb586402a3b30ef3ce0c85f664f064 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Dec 1 11:27:35 2022 +0100 Regenerate the test dataset to include a release with no author This triggers a bug in ListOriginContributors, causing it to include "null" as a contributor. A future commit will fix this. commit 9972a08685c3d6e45119494ee6404c66a6374f26 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Dec 1 10:39:09 2022 +0100 Add ListOriginContributors This Java script (and related Luigi tasks) traverse the graph in topological order, building up the set of all contributors to a node and its ancestors, then dump the value of this set for every origin node they encounter. commit 39fefbfc108087b4b7f86c39312d1f94f06cc16a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 29 17:54:30 2022 +0100 Add Luigi task TopoSort and add a simple test commit 78b4d9016cfd5025811607c9f6069fea1b39eb23 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Nov 28 16:02:56 2022 +0100 Improve comments commit 0a651262c32ff3bca6951323a2ab9fe5e5204f97 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 16:15:04 2022 +0100 Add a sample of two ancestor with each node This allows readers to efficiently get ancestors of nodes with low indegree (ie. most revisions), as it avoids a random access / API call. commit 23f9256cd34f97bc3e6dd9eda51c07232f736e0f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 12:54:14 2022 +0100 revert multithreading, it's actually twice as slow as singlethread commit a62fa7f4b7c468ee7ef731986c7d7fc33c7f4042 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 12:06:21 2022 +0100 tentative multithread DFS commit ab744a8ada1de4cb6a9d3d904406f9e40d74a3db Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 11:49:32 2022 +0100 Implement a naive topological sort commit 550235e4e7a04f10e5c9869e5717b16ca5a2edf8 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 29 17:01:45 2022 +0100 luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3
Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/301/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/301/console
Comment Actions
Build is green
Patch application report for D8912 (id=32117)
Could not rebase; Attempt merge onto ec7f568b13...
Updating ec7f568..172ee6d Fast-forward conftest.py | 1 + .../graph/utils/ListOriginContributors.java | 151 +++++++ .../org/softwareheritage/graph/utils/TopoSort.java | 134 ++++++ mypy.ini | 6 + requirements-luigi.txt | 2 + requirements-swh-luigi.txt | 2 +- requirements-swh.txt | 1 + swh/graph/luigi.py | 474 ++++++++++++++++++++- .../dataset/compressed/example-labelled.labelobl | Bin 0 -> 772 bytes .../compressed/example-labelled.labeloffsets | 3 +- .../dataset/compressed/example-labelled.labels | 2 +- .../dataset/compressed/example-labelled.properties | 2 +- .../example-transposed-labelled.labelobl | Bin 0 -> 772 bytes .../example-transposed-labelled.labeloffsets | 3 +- .../compressed/example-transposed-labelled.labels | 3 +- .../example-transposed-labelled.properties | 2 +- .../dataset/compressed/example-transposed.graph | 2 +- .../dataset/compressed/example-transposed.obl | Bin 772 -> 772 bytes .../dataset/compressed/example-transposed.offsets | 3 +- .../compressed/example-transposed.properties | 52 +-- .../dataset/compressed/example.edges.count.txt | 2 +- .../dataset/compressed/example.edges.stats.txt | 8 +- swh/graph/tests/dataset/compressed/example.graph | 2 +- .../tests/dataset/compressed/example.indegree | 5 +- .../dataset/compressed/example.labels.count.txt | 2 +- .../dataset/compressed/example.labels.csv.zst | Bin 115 -> 131 bytes .../compressed/example.labels.fcl.bytearray | Bin 110 -> 128 bytes .../dataset/compressed/example.labels.fcl.pointers | Bin 16 -> 24 bytes .../compressed/example.labels.fcl.properties | 2 +- .../tests/dataset/compressed/example.labels.mph | Bin 1521 -> 1529 bytes swh/graph/tests/dataset/compressed/example.mph | Bin 961 -> 961 bytes .../dataset/compressed/example.node2swhid.bin | Bin 462 -> 528 bytes .../tests/dataset/compressed/example.node2type.map | Bin 353 -> 361 bytes .../dataset/compressed/example.nodes.count.txt | 2 +- .../tests/dataset/compressed/example.nodes.csv.zst | Bin 150 -> 181 bytes .../dataset/compressed/example.nodes.stats.txt | 6 +- swh/graph/tests/dataset/compressed/example.obl | Bin 772 -> 772 bytes swh/graph/tests/dataset/compressed/example.offsets | 4 +- swh/graph/tests/dataset/compressed/example.order | Bin 168 -> 192 bytes .../tests/dataset/compressed/example.outdegree | 4 +- .../tests/dataset/compressed/example.persons.mph | Bin 961 -> 961 bytes .../tests/dataset/compressed/example.properties | 50 +-- .../compressed/example.property.author_id.bin | Bin 84 -> 2112 bytes .../example.property.author_timestamp.bin | Bin 168 -> 4224 bytes .../example.property.author_timestamp_offset.bin | Bin 42 -> 1056 bytes .../compressed/example.property.committer_id.bin | Bin 84 -> 2112 bytes .../example.property.committer_timestamp.bin | Bin 168 -> 4224 bytes ...example.property.committer_timestamp_offset.bin | Bin 42 -> 1056 bytes .../example.property.content.is_skipped.bin | Bin 85 -> 149 bytes .../compressed/example.property.content.length.bin | Bin 168 -> 4224 bytes .../compressed/example.property.message.bin | 2 + .../compressed/example.property.message.offset.bin | Bin 168 -> 4224 bytes .../compressed/example.property.tag_name.bin | 1 + .../example.property.tag_name.offset.bin | Bin 168 -> 4224 bytes swh/graph/tests/dataset/compressed/example.stats | 28 +- .../dataset/edges/origin/graph-all.edges.csv.zst | Bin 82 -> 109 bytes .../dataset/edges/origin/graph-all.nodes.csv.zst | Bin 64 -> 95 bytes .../dataset/edges/release/graph-all.edges.csv.zst | Bin 56 -> 73 bytes .../dataset/edges/release/graph-all.nodes.csv.zst | Bin 38 -> 42 bytes .../dataset/edges/snapshot/graph-all.edges.csv.zst | Bin 94 -> 128 bytes .../dataset/edges/snapshot/graph-all.nodes.csv.zst | Bin 33 -> 38 bytes swh/graph/tests/dataset/generate_dataset.py | 46 +- swh/graph/tests/dataset/img/example.dot | 13 +- .../tests/dataset/orc/content/content-all.orc | Bin 1240 -> 1226 bytes .../tests/dataset/orc/directory/directory-all.orc | Bin 578 -> 563 bytes .../orc/directory_entry/directory_entry-all.orc | Bin 1126 -> 1115 bytes swh/graph/tests/dataset/orc/origin/origin-all.orc | Bin 817 -> 935 bytes .../dataset/orc/origin_visit/origin_visit-all.orc | Bin 898 -> 924 bytes .../origin_visit_status-all.orc | Bin 1150 -> 1191 bytes .../tests/dataset/orc/release/release-all.orc | Bin 1361 -> 1407 bytes .../tests/dataset/orc/revision/revision-all.orc | Bin 1658 -> 1643 bytes .../revision_extra_headers-all.orc | Bin 253 -> 236 bytes .../orc/revision_history/revision_history-all.orc | Bin 700 -> 685 bytes .../orc/skipped_content/skipped_content-all.orc | Bin 1177 -> 1160 bytes .../tests/dataset/orc/snapshot/snapshot-all.orc | Bin 459 -> 456 bytes .../orc/snapshot_branch/snapshot_branch-all.orc | Bin 865 -> 921 bytes swh/graph/tests/test_cli.py | 4 +- swh/graph/tests/test_grpc.py | 7 +- swh/graph/tests/test_http_client.py | 18 +- swh/graph/tests/test_luigi.py | 4 +- swh/graph/tests/test_origin_contributors.py | 186 ++++++++ swh/graph/tests/test_toposort.py | 67 +++ 82 files changed, 1192 insertions(+), 114 deletions(-) create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/ListOriginContributors.java create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/TopoSort.java create mode 100644 swh/graph/tests/dataset/compressed/example-labelled.labelobl create mode 100644 swh/graph/tests/dataset/compressed/example-transposed-labelled.labelobl create mode 100644 swh/graph/tests/test_origin_contributors.py create mode 100644 swh/graph/tests/test_toposort.py
Changes applied before test
commit 172ee6deae3102f904284533b657003daf8c0b21 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Dec 1 11:35:43 2022 +0100 ListOriginContributors: Ignore null author/committer in revisions/releases commit ee09b16376dde6a033a4b6147237cdcfec3f081c Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Dec 1 11:27:35 2022 +0100 Regenerate the test dataset to include a release with no author This triggers a bug in ListOriginContributors, causing it to include "null" as a contributor. A future commit will fix this. commit 9972a08685c3d6e45119494ee6404c66a6374f26 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Dec 1 10:39:09 2022 +0100 Add ListOriginContributors This Java script (and related Luigi tasks) traverse the graph in topological order, building up the set of all contributors to a node and its ancestors, then dump the value of this set for every origin node they encounter. commit 39fefbfc108087b4b7f86c39312d1f94f06cc16a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 29 17:54:30 2022 +0100 Add Luigi task TopoSort and add a simple test commit 78b4d9016cfd5025811607c9f6069fea1b39eb23 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Nov 28 16:02:56 2022 +0100 Improve comments commit 0a651262c32ff3bca6951323a2ab9fe5e5204f97 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 16:15:04 2022 +0100 Add a sample of two ancestor with each node This allows readers to efficiently get ancestors of nodes with low indegree (ie. most revisions), as it avoids a random access / API call. commit 23f9256cd34f97bc3e6dd9eda51c07232f736e0f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 12:54:14 2022 +0100 revert multithreading, it's actually twice as slow as singlethread commit a62fa7f4b7c468ee7ef731986c7d7fc33c7f4042 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 12:06:21 2022 +0100 tentative multithread DFS commit ab744a8ada1de4cb6a9d3d904406f9e40d74a3db Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 11:49:32 2022 +0100 Implement a naive topological sort commit 550235e4e7a04f10e5c9869e5717b16ca5a2edf8 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 29 17:01:45 2022 +0100 luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3
See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/303/ for more details.
Comment Actions
Build was aborted
Patch application report for D8912 (id=32161)
Could not rebase; Attempt merge onto 0a8ae5de6f...
Updating 0a8ae5d..dfd4c1d Fast-forward conftest.py | 1 + .../graph/utils/ListOriginContributors.java | 151 +++++++ .../org/softwareheritage/graph/utils/TopoSort.java | 134 ++++++ mypy.ini | 6 + requirements-luigi.txt | 2 + requirements-swh-luigi.txt | 2 +- requirements-swh.txt | 1 + swh/graph/luigi.py | 474 ++++++++++++++++++++- .../dataset/compressed/example-labelled.labelobl | Bin 0 -> 772 bytes .../compressed/example-labelled.labeloffsets | 3 +- .../dataset/compressed/example-labelled.labels | 2 +- .../dataset/compressed/example-labelled.properties | 2 +- .../example-transposed-labelled.labelobl | Bin 0 -> 772 bytes .../example-transposed-labelled.labeloffsets | 3 +- .../compressed/example-transposed-labelled.labels | 3 +- .../example-transposed-labelled.properties | 2 +- .../dataset/compressed/example-transposed.graph | 2 +- .../dataset/compressed/example-transposed.obl | Bin 772 -> 772 bytes .../dataset/compressed/example-transposed.offsets | 3 +- .../compressed/example-transposed.properties | 52 +-- .../dataset/compressed/example.edges.count.txt | 2 +- .../dataset/compressed/example.edges.stats.txt | 8 +- swh/graph/tests/dataset/compressed/example.graph | 2 +- .../tests/dataset/compressed/example.indegree | 5 +- .../dataset/compressed/example.labels.count.txt | 2 +- .../dataset/compressed/example.labels.csv.zst | Bin 115 -> 131 bytes .../compressed/example.labels.fcl.bytearray | Bin 110 -> 128 bytes .../dataset/compressed/example.labels.fcl.pointers | Bin 16 -> 24 bytes .../compressed/example.labels.fcl.properties | 2 +- .../tests/dataset/compressed/example.labels.mph | Bin 1521 -> 1529 bytes swh/graph/tests/dataset/compressed/example.mph | Bin 961 -> 961 bytes .../dataset/compressed/example.node2swhid.bin | Bin 462 -> 528 bytes .../tests/dataset/compressed/example.node2type.map | Bin 353 -> 361 bytes .../dataset/compressed/example.nodes.count.txt | 2 +- .../tests/dataset/compressed/example.nodes.csv.zst | Bin 150 -> 181 bytes .../dataset/compressed/example.nodes.stats.txt | 6 +- swh/graph/tests/dataset/compressed/example.obl | Bin 772 -> 772 bytes swh/graph/tests/dataset/compressed/example.offsets | 4 +- swh/graph/tests/dataset/compressed/example.order | Bin 168 -> 192 bytes .../tests/dataset/compressed/example.outdegree | 4 +- .../tests/dataset/compressed/example.persons.mph | Bin 961 -> 961 bytes .../tests/dataset/compressed/example.properties | 50 +-- .../compressed/example.property.author_id.bin | Bin 84 -> 2112 bytes .../example.property.author_timestamp.bin | Bin 168 -> 4224 bytes .../example.property.author_timestamp_offset.bin | Bin 42 -> 1056 bytes .../compressed/example.property.committer_id.bin | Bin 84 -> 2112 bytes .../example.property.committer_timestamp.bin | Bin 168 -> 4224 bytes ...example.property.committer_timestamp_offset.bin | Bin 42 -> 1056 bytes .../example.property.content.is_skipped.bin | Bin 85 -> 149 bytes .../compressed/example.property.content.length.bin | Bin 168 -> 4224 bytes .../compressed/example.property.message.bin | 2 + .../compressed/example.property.message.offset.bin | Bin 168 -> 4224 bytes .../compressed/example.property.tag_name.bin | 1 + .../example.property.tag_name.offset.bin | Bin 168 -> 4224 bytes swh/graph/tests/dataset/compressed/example.stats | 28 +- .../dataset/edges/origin/graph-all.edges.csv.zst | Bin 82 -> 109 bytes .../dataset/edges/origin/graph-all.nodes.csv.zst | Bin 64 -> 95 bytes .../dataset/edges/release/graph-all.edges.csv.zst | Bin 56 -> 73 bytes .../dataset/edges/release/graph-all.nodes.csv.zst | Bin 38 -> 42 bytes .../dataset/edges/snapshot/graph-all.edges.csv.zst | Bin 94 -> 128 bytes .../dataset/edges/snapshot/graph-all.nodes.csv.zst | Bin 33 -> 38 bytes swh/graph/tests/dataset/generate_dataset.py | 46 +- swh/graph/tests/dataset/img/example.dot | 13 +- .../tests/dataset/orc/content/content-all.orc | Bin 1240 -> 1226 bytes .../tests/dataset/orc/directory/directory-all.orc | Bin 578 -> 563 bytes .../orc/directory_entry/directory_entry-all.orc | Bin 1126 -> 1115 bytes swh/graph/tests/dataset/orc/origin/origin-all.orc | Bin 817 -> 935 bytes .../dataset/orc/origin_visit/origin_visit-all.orc | Bin 898 -> 924 bytes .../origin_visit_status-all.orc | Bin 1150 -> 1191 bytes .../tests/dataset/orc/release/release-all.orc | Bin 1361 -> 1407 bytes .../tests/dataset/orc/revision/revision-all.orc | Bin 1658 -> 1643 bytes .../revision_extra_headers-all.orc | Bin 253 -> 236 bytes .../orc/revision_history/revision_history-all.orc | Bin 700 -> 685 bytes .../orc/skipped_content/skipped_content-all.orc | Bin 1177 -> 1160 bytes .../tests/dataset/orc/snapshot/snapshot-all.orc | Bin 459 -> 456 bytes .../orc/snapshot_branch/snapshot_branch-all.orc | Bin 865 -> 921 bytes swh/graph/tests/test_cli.py | 4 +- swh/graph/tests/test_grpc.py | 7 +- swh/graph/tests/test_http_client.py | 18 +- swh/graph/tests/test_luigi.py | 4 +- swh/graph/tests/test_origin_contributors.py | 186 ++++++++ swh/graph/tests/test_toposort.py | 67 +++ 82 files changed, 1192 insertions(+), 114 deletions(-) create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/ListOriginContributors.java create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/TopoSort.java create mode 100644 swh/graph/tests/dataset/compressed/example-labelled.labelobl create mode 100644 swh/graph/tests/dataset/compressed/example-transposed-labelled.labelobl create mode 100644 swh/graph/tests/test_origin_contributors.py create mode 100644 swh/graph/tests/test_toposort.py
Changes applied before test
commit dfd4c1dc3b224477f9adb33c15f6c75bcdf78244 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Dec 1 11:35:43 2022 +0100 ListOriginContributors: Ignore null author/committer in revisions/releases commit 559d4068bfe1dd50d57062192c0e22664ada03c8 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Dec 1 11:27:35 2022 +0100 Regenerate the test dataset to include a release with no author This triggers a bug in ListOriginContributors, causing it to include "null" as a contributor. A future commit will fix this. commit f3235e3184850b074b2a332686911688aafcdd84 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Dec 1 10:39:09 2022 +0100 Add ListOriginContributors This Java script (and related Luigi tasks) traverse the graph in topological order, building up the set of all contributors to a node and its ancestors, then dump the value of this set for every origin node they encounter. commit ab2703efcb9ad93a3d959596ed7edef27d908164 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 29 17:54:30 2022 +0100 Add Luigi task TopoSort and add a simple test commit 58f44785816bde0f6cdbf86e3ff6f1fbf385a487 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Nov 28 16:02:56 2022 +0100 Improve comments commit 922894410b6e14f5a9eeec445d4a0b503df77a9e Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 16:15:04 2022 +0100 Add a sample of two ancestor with each node This allows readers to efficiently get ancestors of nodes with low indegree (ie. most revisions), as it avoids a random access / API call. commit 7bee5d47a6eb49ac594f2d019222c176373a5248 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 12:54:14 2022 +0100 revert multithreading, it's actually twice as slow as singlethread commit 30dad16a2365021bedf72df78d0753e125765016 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 12:06:21 2022 +0100 tentative multithread DFS commit ed6636c26be869a7309581d0ec664488b4d69e9f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 11:49:32 2022 +0100 Implement a naive topological sort commit b8dc411ccd304597df96d7dd36158fb86e5239fd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 29 17:01:45 2022 +0100 luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3
Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/314/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/314/console