This Java script (and related Luigi tasks) traverse the graph in
topological order, building up the set of all contributors to a
node and its ancestors, then dump the value of this set for every
origin node they encounter.
Depends on D8883.
Differential D8908
Add ListOriginContributors vlorentz on Dec 1 2022, 10:40 AM. Authored by
Details
This Java script (and related Luigi tasks) traverse the graph in Depends on D8883.
Diff Detail
Event TimelineComment Actions Build has FAILED Patch application report for D8908 (id=32109)Could not rebase; Attempt merge onto ec7f568b13... Updating ec7f568..603e24a Fast-forward conftest.py | 1 + .../graph/utils/ListOriginContributors.java | 143 +++++++ .../org/softwareheritage/graph/utils/TopoSort.java | 134 ++++++ mypy.ini | 3 + requirements-luigi.txt | 2 + requirements-swh-luigi.txt | 2 +- requirements-swh.txt | 1 + swh/graph/luigi.py | 468 ++++++++++++++++++++- swh/graph/tests/test_origin_contributors.py | 180 ++++++++ swh/graph/tests/test_toposort.py | 59 +++ 10 files changed, 989 insertions(+), 4 deletions(-) create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/ListOriginContributors.java create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/TopoSort.java create mode 100644 swh/graph/tests/test_origin_contributors.py create mode 100644 swh/graph/tests/test_toposort.py Changes applied before testcommit 603e24a498964309f1c42ac47fd3b8f3caa83405 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Dec 1 10:39:09 2022 +0100 Add ListOriginContributors This Java script (and related Luigi tasks) traverse the graph in topological order, building up the set of all contributors to a node and its ancestors, then dump the value of this set for every origin node they encounter. commit 39fefbfc108087b4b7f86c39312d1f94f06cc16a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 29 17:54:30 2022 +0100 Add Luigi task TopoSort and add a simple test commit 78b4d9016cfd5025811607c9f6069fea1b39eb23 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Nov 28 16:02:56 2022 +0100 Improve comments commit 0a651262c32ff3bca6951323a2ab9fe5e5204f97 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 16:15:04 2022 +0100 Add a sample of two ancestor with each node This allows readers to efficiently get ancestors of nodes with low indegree (ie. most revisions), as it avoids a random access / API call. commit 23f9256cd34f97bc3e6dd9eda51c07232f736e0f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 12:54:14 2022 +0100 revert multithreading, it's actually twice as slow as singlethread commit a62fa7f4b7c468ee7ef731986c7d7fc33c7f4042 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 12:06:21 2022 +0100 tentative multithread DFS commit ab744a8ada1de4cb6a9d3d904406f9e40d74a3db Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 11:49:32 2022 +0100 Implement a naive topological sort commit 550235e4e7a04f10e5c9869e5717b16ca5a2edf8 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 29 17:01:45 2022 +0100 luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3 Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/297/ Comment Actions Build is green Patch application report for D8908 (id=32111)Could not rebase; Attempt merge onto ec7f568b13... Updating ec7f568..36f6230 Fast-forward conftest.py | 1 + .../graph/utils/ListOriginContributors.java | 143 +++++++ .../org/softwareheritage/graph/utils/TopoSort.java | 134 ++++++ mypy.ini | 6 + requirements-luigi.txt | 2 + requirements-swh-luigi.txt | 2 +- requirements-swh.txt | 1 + swh/graph/luigi.py | 468 ++++++++++++++++++++- swh/graph/tests/test_origin_contributors.py | 180 ++++++++ swh/graph/tests/test_toposort.py | 59 +++ 10 files changed, 992 insertions(+), 4 deletions(-) create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/ListOriginContributors.java create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/TopoSort.java create mode 100644 swh/graph/tests/test_origin_contributors.py create mode 100644 swh/graph/tests/test_toposort.py Changes applied before testcommit 36f62302756d56c3d61234e2d46d582fe5125853 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Dec 1 10:39:09 2022 +0100 Add ListOriginContributors This Java script (and related Luigi tasks) traverse the graph in topological order, building up the set of all contributors to a node and its ancestors, then dump the value of this set for every origin node they encounter. commit 39fefbfc108087b4b7f86c39312d1f94f06cc16a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 29 17:54:30 2022 +0100 Add Luigi task TopoSort and add a simple test commit 78b4d9016cfd5025811607c9f6069fea1b39eb23 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Nov 28 16:02:56 2022 +0100 Improve comments commit 0a651262c32ff3bca6951323a2ab9fe5e5204f97 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 16:15:04 2022 +0100 Add a sample of two ancestor with each node This allows readers to efficiently get ancestors of nodes with low indegree (ie. most revisions), as it avoids a random access / API call. commit 23f9256cd34f97bc3e6dd9eda51c07232f736e0f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 12:54:14 2022 +0100 revert multithreading, it's actually twice as slow as singlethread commit a62fa7f4b7c468ee7ef731986c7d7fc33c7f4042 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 12:06:21 2022 +0100 tentative multithread DFS commit ab744a8ada1de4cb6a9d3d904406f9e40d74a3db Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 11:49:32 2022 +0100 Implement a naive topological sort commit 550235e4e7a04f10e5c9869e5717b16ca5a2edf8 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 29 17:01:45 2022 +0100 luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3 See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/298/ for more details. Comment Actions Build is green Patch application report for D8908 (id=32112)Could not rebase; Attempt merge onto ec7f568b13... Updating ec7f568..9972a08 Fast-forward conftest.py | 1 + .../graph/utils/ListOriginContributors.java | 141 +++++++ .../org/softwareheritage/graph/utils/TopoSort.java | 134 ++++++ mypy.ini | 6 + requirements-luigi.txt | 2 + requirements-swh-luigi.txt | 2 +- requirements-swh.txt | 1 + swh/graph/luigi.py | 468 ++++++++++++++++++++- swh/graph/tests/test_origin_contributors.py | 180 ++++++++ swh/graph/tests/test_toposort.py | 59 +++ 10 files changed, 990 insertions(+), 4 deletions(-) create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/ListOriginContributors.java create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/TopoSort.java create mode 100644 swh/graph/tests/test_origin_contributors.py create mode 100644 swh/graph/tests/test_toposort.py Changes applied before testcommit 9972a08685c3d6e45119494ee6404c66a6374f26 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Dec 1 10:39:09 2022 +0100 Add ListOriginContributors This Java script (and related Luigi tasks) traverse the graph in topological order, building up the set of all contributors to a node and its ancestors, then dump the value of this set for every origin node they encounter. commit 39fefbfc108087b4b7f86c39312d1f94f06cc16a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 29 17:54:30 2022 +0100 Add Luigi task TopoSort and add a simple test commit 78b4d9016cfd5025811607c9f6069fea1b39eb23 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Nov 28 16:02:56 2022 +0100 Improve comments commit 0a651262c32ff3bca6951323a2ab9fe5e5204f97 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 16:15:04 2022 +0100 Add a sample of two ancestor with each node This allows readers to efficiently get ancestors of nodes with low indegree (ie. most revisions), as it avoids a random access / API call. commit 23f9256cd34f97bc3e6dd9eda51c07232f736e0f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 12:54:14 2022 +0100 revert multithreading, it's actually twice as slow as singlethread commit a62fa7f4b7c468ee7ef731986c7d7fc33c7f4042 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 12:06:21 2022 +0100 tentative multithread DFS commit ab744a8ada1de4cb6a9d3d904406f9e40d74a3db Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 11:49:32 2022 +0100 Implement a naive topological sort commit 550235e4e7a04f10e5c9869e5717b16ca5a2edf8 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 29 17:01:45 2022 +0100 luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3 See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/299/ for more details. Comment Actions LGTM, added a couple of nitpicks as inline comments.
Comment Actions Build was aborted Patch application report for D8908 (id=32159)Could not rebase; Attempt merge onto 0a8ae5de6f... Updating 0a8ae5d..f3235e3 Fast-forward conftest.py | 1 + .../graph/utils/ListOriginContributors.java | 141 +++++++ .../org/softwareheritage/graph/utils/TopoSort.java | 134 ++++++ mypy.ini | 6 + requirements-luigi.txt | 2 + requirements-swh-luigi.txt | 2 +- requirements-swh.txt | 1 + swh/graph/luigi.py | 468 ++++++++++++++++++++- swh/graph/tests/test_origin_contributors.py | 180 ++++++++ swh/graph/tests/test_toposort.py | 59 +++ 10 files changed, 990 insertions(+), 4 deletions(-) create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/ListOriginContributors.java create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/TopoSort.java create mode 100644 swh/graph/tests/test_origin_contributors.py create mode 100644 swh/graph/tests/test_toposort.py Changes applied before testcommit f3235e3184850b074b2a332686911688aafcdd84 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Dec 1 10:39:09 2022 +0100 Add ListOriginContributors This Java script (and related Luigi tasks) traverse the graph in topological order, building up the set of all contributors to a node and its ancestors, then dump the value of this set for every origin node they encounter. commit ab2703efcb9ad93a3d959596ed7edef27d908164 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 29 17:54:30 2022 +0100 Add Luigi task TopoSort and add a simple test commit 58f44785816bde0f6cdbf86e3ff6f1fbf385a487 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Nov 28 16:02:56 2022 +0100 Improve comments commit 922894410b6e14f5a9eeec445d4a0b503df77a9e Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 16:15:04 2022 +0100 Add a sample of two ancestor with each node This allows readers to efficiently get ancestors of nodes with low indegree (ie. most revisions), as it avoids a random access / API call. commit 7bee5d47a6eb49ac594f2d019222c176373a5248 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 12:54:14 2022 +0100 revert multithreading, it's actually twice as slow as singlethread commit 30dad16a2365021bedf72df78d0753e125765016 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 12:06:21 2022 +0100 tentative multithread DFS commit ed6636c26be869a7309581d0ec664488b4d69e9f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 24 11:49:32 2022 +0100 Implement a naive topological sort commit b8dc411ccd304597df96d7dd36158fb86e5239fd Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Nov 29 17:01:45 2022 +0100 luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3 Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/312/ |