This Java script (and related Luigi tasks) traverse the graph in
topological order, building up the set of all contributors to a
node and its ancestors, then dump the value of this set for every
origin node they encounter.
Depends on D8883.
Differential D8908
Add ListOriginContributors Authored by vlorentz on Dec 1 2022, 10:40 AM.
Details
This Java script (and related Luigi tasks) traverse the graph in Depends on D8883.
Diff Detail
Event TimelineComment Actions Build has FAILED Patch application report for D8908 (id=32109)Could not rebase; Attempt merge onto ec7f568b13... Updating ec7f568..603e24a Fast-forward conftest.py | 1 + .../graph/utils/ListOriginContributors.java | 143 +++++++ .../org/softwareheritage/graph/utils/TopoSort.java | 134 ++++++ mypy.ini | 3 + requirements-luigi.txt | 2 + requirements-swh-luigi.txt | 2 +- requirements-swh.txt | 1 + swh/graph/luigi.py | 468 ++++++++++++++++++++- swh/graph/tests/test_origin_contributors.py | 180 ++++++++ swh/graph/tests/test_toposort.py | 59 +++ 10 files changed, 989 insertions(+), 4 deletions(-) create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/ListOriginContributors.java create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/TopoSort.java create mode 100644 swh/graph/tests/test_origin_contributors.py create mode 100644 swh/graph/tests/test_toposort.py Changes applied before testcommit 603e24a498964309f1c42ac47fd3b8f3caa83405
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Dec 1 10:39:09 2022 +0100
Add ListOriginContributors
This Java script (and related Luigi tasks) traverse the graph in
topological order, building up the set of all contributors to a
node and its ancestors, then dump the value of this set for every
origin node they encounter.
commit 39fefbfc108087b4b7f86c39312d1f94f06cc16a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Nov 29 17:54:30 2022 +0100
Add Luigi task TopoSort and add a simple test
commit 78b4d9016cfd5025811607c9f6069fea1b39eb23
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Nov 28 16:02:56 2022 +0100
Improve comments
commit 0a651262c32ff3bca6951323a2ab9fe5e5204f97
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 16:15:04 2022 +0100
Add a sample of two ancestor with each node
This allows readers to efficiently get ancestors of nodes with low indegree
(ie. most revisions), as it avoids a random access / API call.
commit 23f9256cd34f97bc3e6dd9eda51c07232f736e0f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 12:54:14 2022 +0100
revert multithreading, it's actually twice as slow as singlethread
commit a62fa7f4b7c468ee7ef731986c7d7fc33c7f4042
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 12:06:21 2022 +0100
tentative multithread DFS
commit ab744a8ada1de4cb6a9d3d904406f9e40d74a3db
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 11:49:32 2022 +0100
Implement a naive topological sort
commit 550235e4e7a04f10e5c9869e5717b16ca5a2edf8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Nov 29 17:01:45 2022 +0100
luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/297/ Comment Actions Build is green Patch application report for D8908 (id=32111)Could not rebase; Attempt merge onto ec7f568b13... Updating ec7f568..36f6230 Fast-forward conftest.py | 1 + .../graph/utils/ListOriginContributors.java | 143 +++++++ .../org/softwareheritage/graph/utils/TopoSort.java | 134 ++++++ mypy.ini | 6 + requirements-luigi.txt | 2 + requirements-swh-luigi.txt | 2 +- requirements-swh.txt | 1 + swh/graph/luigi.py | 468 ++++++++++++++++++++- swh/graph/tests/test_origin_contributors.py | 180 ++++++++ swh/graph/tests/test_toposort.py | 59 +++ 10 files changed, 992 insertions(+), 4 deletions(-) create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/ListOriginContributors.java create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/TopoSort.java create mode 100644 swh/graph/tests/test_origin_contributors.py create mode 100644 swh/graph/tests/test_toposort.py Changes applied before testcommit 36f62302756d56c3d61234e2d46d582fe5125853
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Dec 1 10:39:09 2022 +0100
Add ListOriginContributors
This Java script (and related Luigi tasks) traverse the graph in
topological order, building up the set of all contributors to a
node and its ancestors, then dump the value of this set for every
origin node they encounter.
commit 39fefbfc108087b4b7f86c39312d1f94f06cc16a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Nov 29 17:54:30 2022 +0100
Add Luigi task TopoSort and add a simple test
commit 78b4d9016cfd5025811607c9f6069fea1b39eb23
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Nov 28 16:02:56 2022 +0100
Improve comments
commit 0a651262c32ff3bca6951323a2ab9fe5e5204f97
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 16:15:04 2022 +0100
Add a sample of two ancestor with each node
This allows readers to efficiently get ancestors of nodes with low indegree
(ie. most revisions), as it avoids a random access / API call.
commit 23f9256cd34f97bc3e6dd9eda51c07232f736e0f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 12:54:14 2022 +0100
revert multithreading, it's actually twice as slow as singlethread
commit a62fa7f4b7c468ee7ef731986c7d7fc33c7f4042
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 12:06:21 2022 +0100
tentative multithread DFS
commit ab744a8ada1de4cb6a9d3d904406f9e40d74a3db
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 11:49:32 2022 +0100
Implement a naive topological sort
commit 550235e4e7a04f10e5c9869e5717b16ca5a2edf8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Nov 29 17:01:45 2022 +0100
luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/298/ for more details. Comment Actions Build is green Patch application report for D8908 (id=32112)Could not rebase; Attempt merge onto ec7f568b13... Updating ec7f568..9972a08 Fast-forward conftest.py | 1 + .../graph/utils/ListOriginContributors.java | 141 +++++++ .../org/softwareheritage/graph/utils/TopoSort.java | 134 ++++++ mypy.ini | 6 + requirements-luigi.txt | 2 + requirements-swh-luigi.txt | 2 +- requirements-swh.txt | 1 + swh/graph/luigi.py | 468 ++++++++++++++++++++- swh/graph/tests/test_origin_contributors.py | 180 ++++++++ swh/graph/tests/test_toposort.py | 59 +++ 10 files changed, 990 insertions(+), 4 deletions(-) create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/ListOriginContributors.java create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/TopoSort.java create mode 100644 swh/graph/tests/test_origin_contributors.py create mode 100644 swh/graph/tests/test_toposort.py Changes applied before testcommit 9972a08685c3d6e45119494ee6404c66a6374f26
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Dec 1 10:39:09 2022 +0100
Add ListOriginContributors
This Java script (and related Luigi tasks) traverse the graph in
topological order, building up the set of all contributors to a
node and its ancestors, then dump the value of this set for every
origin node they encounter.
commit 39fefbfc108087b4b7f86c39312d1f94f06cc16a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Nov 29 17:54:30 2022 +0100
Add Luigi task TopoSort and add a simple test
commit 78b4d9016cfd5025811607c9f6069fea1b39eb23
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Nov 28 16:02:56 2022 +0100
Improve comments
commit 0a651262c32ff3bca6951323a2ab9fe5e5204f97
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 16:15:04 2022 +0100
Add a sample of two ancestor with each node
This allows readers to efficiently get ancestors of nodes with low indegree
(ie. most revisions), as it avoids a random access / API call.
commit 23f9256cd34f97bc3e6dd9eda51c07232f736e0f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 12:54:14 2022 +0100
revert multithreading, it's actually twice as slow as singlethread
commit a62fa7f4b7c468ee7ef731986c7d7fc33c7f4042
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 12:06:21 2022 +0100
tentative multithread DFS
commit ab744a8ada1de4cb6a9d3d904406f9e40d74a3db
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 11:49:32 2022 +0100
Implement a naive topological sort
commit 550235e4e7a04f10e5c9869e5717b16ca5a2edf8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Nov 29 17:01:45 2022 +0100
luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/299/ for more details. Comment Actions LGTM, added a couple of nitpicks as inline comments.
Comment Actions Build was aborted Patch application report for D8908 (id=32159)Could not rebase; Attempt merge onto 0a8ae5de6f... Updating 0a8ae5d..f3235e3 Fast-forward conftest.py | 1 + .../graph/utils/ListOriginContributors.java | 141 +++++++ .../org/softwareheritage/graph/utils/TopoSort.java | 134 ++++++ mypy.ini | 6 + requirements-luigi.txt | 2 + requirements-swh-luigi.txt | 2 +- requirements-swh.txt | 1 + swh/graph/luigi.py | 468 ++++++++++++++++++++- swh/graph/tests/test_origin_contributors.py | 180 ++++++++ swh/graph/tests/test_toposort.py | 59 +++ 10 files changed, 990 insertions(+), 4 deletions(-) create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/ListOriginContributors.java create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/TopoSort.java create mode 100644 swh/graph/tests/test_origin_contributors.py create mode 100644 swh/graph/tests/test_toposort.py Changes applied before testcommit f3235e3184850b074b2a332686911688aafcdd84
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Dec 1 10:39:09 2022 +0100
Add ListOriginContributors
This Java script (and related Luigi tasks) traverse the graph in
topological order, building up the set of all contributors to a
node and its ancestors, then dump the value of this set for every
origin node they encounter.
commit ab2703efcb9ad93a3d959596ed7edef27d908164
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Nov 29 17:54:30 2022 +0100
Add Luigi task TopoSort and add a simple test
commit 58f44785816bde0f6cdbf86e3ff6f1fbf385a487
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Nov 28 16:02:56 2022 +0100
Improve comments
commit 922894410b6e14f5a9eeec445d4a0b503df77a9e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 16:15:04 2022 +0100
Add a sample of two ancestor with each node
This allows readers to efficiently get ancestors of nodes with low indegree
(ie. most revisions), as it avoids a random access / API call.
commit 7bee5d47a6eb49ac594f2d019222c176373a5248
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 12:54:14 2022 +0100
revert multithreading, it's actually twice as slow as singlethread
commit 30dad16a2365021bedf72df78d0753e125765016
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 12:06:21 2022 +0100
tentative multithread DFS
commit ed6636c26be869a7309581d0ec664488b4d69e9f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 24 11:49:32 2022 +0100
Implement a naive topological sort
commit b8dc411ccd304597df96d7dd36158fb86e5239fd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Nov 29 17:01:45 2022 +0100
luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/312/ | ||||||||||||||||||||||||||||||||