Page MenuHomeSoftware Heritage

ListOriginContributors: Ignore null author/committer in revisions/releases
ClosedPublic

Authored by vlorentz on Dec 1 2022, 11:36 AM.

Diff Detail

Repository
rDGRPH Compressed graph representation
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D8912 (id=32115)

Could not rebase; Attempt merge onto ec7f568b13...

Updating ec7f568..df18b5b
Fast-forward
 conftest.py                                        |   1 +
 .../graph/utils/ListOriginContributors.java        | 151 +++++++
 .../org/softwareheritage/graph/utils/TopoSort.java | 134 ++++++
 mypy.ini                                           |   6 +
 requirements-luigi.txt                             |   2 +
 requirements-swh-luigi.txt                         |   2 +-
 requirements-swh.txt                               |   1 +
 swh/graph/luigi.py                                 | 474 ++++++++++++++++++++-
 .../dataset/compressed/example-labelled.labelobl   | Bin 0 -> 772 bytes
 .../compressed/example-labelled.labeloffsets       |   3 +-
 .../dataset/compressed/example-labelled.labels     |   2 +-
 .../dataset/compressed/example-labelled.properties |   2 +-
 .../example-transposed-labelled.labelobl           | Bin 0 -> 772 bytes
 .../example-transposed-labelled.labeloffsets       |   3 +-
 .../compressed/example-transposed-labelled.labels  |   3 +-
 .../example-transposed-labelled.properties         |   2 +-
 .../dataset/compressed/example-transposed.graph    |   2 +-
 .../dataset/compressed/example-transposed.obl      | Bin 772 -> 772 bytes
 .../dataset/compressed/example-transposed.offsets  |   3 +-
 .../compressed/example-transposed.properties       |  52 +--
 .../dataset/compressed/example.edges.count.txt     |   2 +-
 .../dataset/compressed/example.edges.stats.txt     |   8 +-
 swh/graph/tests/dataset/compressed/example.graph   |   2 +-
 .../tests/dataset/compressed/example.indegree      |   5 +-
 .../dataset/compressed/example.labels.count.txt    |   2 +-
 .../dataset/compressed/example.labels.csv.zst      | Bin 115 -> 131 bytes
 .../compressed/example.labels.fcl.bytearray        | Bin 110 -> 128 bytes
 .../dataset/compressed/example.labels.fcl.pointers | Bin 16 -> 24 bytes
 .../compressed/example.labels.fcl.properties       |   2 +-
 .../tests/dataset/compressed/example.labels.mph    | Bin 1521 -> 1529 bytes
 swh/graph/tests/dataset/compressed/example.mph     | Bin 961 -> 961 bytes
 .../dataset/compressed/example.node2swhid.bin      | Bin 462 -> 528 bytes
 .../tests/dataset/compressed/example.node2type.map | Bin 353 -> 361 bytes
 .../dataset/compressed/example.nodes.count.txt     |   2 +-
 .../tests/dataset/compressed/example.nodes.csv.zst | Bin 150 -> 181 bytes
 .../dataset/compressed/example.nodes.stats.txt     |   6 +-
 swh/graph/tests/dataset/compressed/example.obl     | Bin 772 -> 772 bytes
 swh/graph/tests/dataset/compressed/example.offsets |   4 +-
 swh/graph/tests/dataset/compressed/example.order   | Bin 168 -> 192 bytes
 .../tests/dataset/compressed/example.outdegree     |   4 +-
 .../tests/dataset/compressed/example.persons.mph   | Bin 961 -> 961 bytes
 .../tests/dataset/compressed/example.properties    |  50 +--
 .../compressed/example.property.author_id.bin      | Bin 84 -> 2112 bytes
 .../example.property.author_timestamp.bin          | Bin 168 -> 4224 bytes
 .../example.property.author_timestamp_offset.bin   | Bin 42 -> 1056 bytes
 .../compressed/example.property.committer_id.bin   | Bin 84 -> 2112 bytes
 .../example.property.committer_timestamp.bin       | Bin 168 -> 4224 bytes
 ...example.property.committer_timestamp_offset.bin | Bin 42 -> 1056 bytes
 .../example.property.content.is_skipped.bin        | Bin 85 -> 149 bytes
 .../compressed/example.property.content.length.bin | Bin 168 -> 4224 bytes
 .../compressed/example.property.message.bin        |   2 +
 .../compressed/example.property.message.offset.bin | Bin 168 -> 4224 bytes
 .../compressed/example.property.tag_name.bin       |   1 +
 .../example.property.tag_name.offset.bin           | Bin 168 -> 4224 bytes
 swh/graph/tests/dataset/compressed/example.stats   |  28 +-
 .../logs/example-1669888191558-extract_nodes.log   |  31 ++
 .../compressed/logs/example-1669888192235-mph.log  |  15 +
 .../compressed/logs/example-1669888192705-bv.log   |  35 ++
 .../compressed/logs/example-1669888198778-bfs.log  |   7 +
 .../logs/example-1669888199039-permute_bfs.log     |  23 +
 .../logs/example-1669888199374-transpose_bfs.log   |  19 +
 .../logs/example-1669888199720-simplify.log        |  22 +
 .../compressed/logs/example-1669888199989-llp.log  | 143 +++++++
 .../logs/example-1669888200352-permute_llp.log     |  23 +
 .../compressed/logs/example-1669888200692-obl.log  |   4 +
 .../logs/example-1669888200927-compose_orders.log  |   4 +
 .../logs/example-1669888201039-stats.log           |   7 +
 .../logs/example-1669888201272-transpose.log       |  19 +
 .../logs/example-1669888201615-transpose_obl.log   |   4 +
 .../compressed/logs/example-1669888201853-maps.log |  18 +
 .../logs/example-1669888202131-extract_persons.log |  11 +
 .../logs/example-1669888202702-mph_persons.log     |  15 +
 .../logs/example-1669888203136-node_properties.log |  36 ++
 .../logs/example-1669888203831-mph_labels.log      |  26 ++
 .../logs/example-1669888204319-fcl_labels.log      |   6 +
 .../logs/example-1669888204581-edge_labels.log     |  39 ++
 .../logs/example-1669888210521-edge_labels_obl.log |   4 +
 ...ple-1669888210788-edge_labels_transpose_obl.log |   4 +
 .../logs/example-1669888211035-clean_tmp.log       |   3 +
 .../dataset/edges/origin/graph-all.edges.csv.zst   | Bin 82 -> 109 bytes
 .../dataset/edges/origin/graph-all.nodes.csv.zst   | Bin 64 -> 95 bytes
 .../dataset/edges/release/graph-all.edges.csv.zst  | Bin 56 -> 73 bytes
 .../dataset/edges/release/graph-all.nodes.csv.zst  | Bin 38 -> 42 bytes
 .../dataset/edges/snapshot/graph-all.edges.csv.zst | Bin 94 -> 128 bytes
 .../dataset/edges/snapshot/graph-all.nodes.csv.zst | Bin 33 -> 38 bytes
 swh/graph/tests/dataset/generate_dataset.py        |  46 +-
 swh/graph/tests/dataset/img/example.dot            |  13 +-
 .../tests/dataset/orc/content/content-all.orc      | Bin 1240 -> 1226 bytes
 .../tests/dataset/orc/directory/directory-all.orc  | Bin 578 -> 563 bytes
 .../orc/directory_entry/directory_entry-all.orc    | Bin 1126 -> 1115 bytes
 swh/graph/tests/dataset/orc/origin/origin-all.orc  | Bin 817 -> 935 bytes
 .../dataset/orc/origin_visit/origin_visit-all.orc  | Bin 898 -> 924 bytes
 .../origin_visit_status-all.orc                    | Bin 1150 -> 1191 bytes
 .../tests/dataset/orc/release/release-all.orc      | Bin 1361 -> 1407 bytes
 .../tests/dataset/orc/revision/revision-all.orc    | Bin 1658 -> 1643 bytes
 .../revision_extra_headers-all.orc                 | Bin 253 -> 236 bytes
 .../orc/revision_history/revision_history-all.orc  | Bin 700 -> 685 bytes
 .../orc/skipped_content/skipped_content-all.orc    | Bin 1177 -> 1160 bytes
 .../tests/dataset/orc/snapshot/snapshot-all.orc    | Bin 459 -> 456 bytes
 .../orc/snapshot_branch/snapshot_branch-all.orc    | Bin 865 -> 921 bytes
 swh/graph/tests/test_cli.py                        |   4 +-
 swh/graph/tests/test_grpc.py                       |   7 +-
 swh/graph/tests/test_http_client.py                |  18 +-
 swh/graph/tests/test_luigi.py                      |   4 +-
 swh/graph/tests/test_origin_contributors.py        | 186 ++++++++
 swh/graph/tests/test_toposort.py                   |  67 +++
 106 files changed, 1710 insertions(+), 114 deletions(-)
 create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/ListOriginContributors.java
 create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/TopoSort.java
 create mode 100644 swh/graph/tests/dataset/compressed/example-labelled.labelobl
 create mode 100644 swh/graph/tests/dataset/compressed/example-transposed-labelled.labelobl
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888191558-extract_nodes.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888192235-mph.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888192705-bv.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888198778-bfs.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888199039-permute_bfs.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888199374-transpose_bfs.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888199720-simplify.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888199989-llp.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888200352-permute_llp.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888200692-obl.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888200927-compose_orders.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888201039-stats.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888201272-transpose.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888201615-transpose_obl.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888201853-maps.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888202131-extract_persons.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888202702-mph_persons.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888203136-node_properties.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888203831-mph_labels.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888204319-fcl_labels.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888204581-edge_labels.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888210521-edge_labels_obl.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888210788-edge_labels_transpose_obl.log
 create mode 100644 swh/graph/tests/dataset/compressed/logs/example-1669888211035-clean_tmp.log
 create mode 100644 swh/graph/tests/test_origin_contributors.py
 create mode 100644 swh/graph/tests/test_toposort.py
Changes applied before test
commit df18b5babe08fce8a7b3ea1f022062009c34b8a6
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Dec 1 11:35:43 2022 +0100

    ListOriginContributors: Ignore null author/committer in revisions/releases

commit 67abec7533eb586402a3b30ef3ce0c85f664f064
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Dec 1 11:27:35 2022 +0100

    Regenerate the test dataset to include a release with no author
    
    This triggers a bug in ListOriginContributors, causing it to include
    "null" as a contributor.
    A future commit will fix this.

commit 9972a08685c3d6e45119494ee6404c66a6374f26
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Dec 1 10:39:09 2022 +0100

    Add ListOriginContributors
    
    This Java script (and related Luigi tasks) traverse the graph in
    topological order, building up the set of all contributors to a
    node and its ancestors, then dump the value of this set for every
    origin node they encounter.

commit 39fefbfc108087b4b7f86c39312d1f94f06cc16a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 29 17:54:30 2022 +0100

    Add Luigi task TopoSort and add a simple test

commit 78b4d9016cfd5025811607c9f6069fea1b39eb23
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 28 16:02:56 2022 +0100

    Improve comments

commit 0a651262c32ff3bca6951323a2ab9fe5e5204f97
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 24 16:15:04 2022 +0100

    Add a sample of two ancestor with each node
    
    This allows readers to efficiently get ancestors of nodes with low indegree
    (ie. most revisions), as it avoids a random access / API call.

commit 23f9256cd34f97bc3e6dd9eda51c07232f736e0f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 24 12:54:14 2022 +0100

    revert multithreading, it's actually twice as slow as singlethread

commit a62fa7f4b7c468ee7ef731986c7d7fc33c7f4042
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 24 12:06:21 2022 +0100

    tentative multithread DFS

commit ab744a8ada1de4cb6a9d3d904406f9e40d74a3db
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 24 11:49:32 2022 +0100

    Implement a naive topological sort

commit 550235e4e7a04f10e5c9869e5717b16ca5a2edf8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 29 17:01:45 2022 +0100

    luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3

Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/301/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/301/console

Harbormaster returned this revision to the author for changes because remote builds failed.Dec 1 2022, 11:42 AM
Harbormaster failed remote builds in B33068: Diff 32115!

Build is green

Patch application report for D8912 (id=32117)

Could not rebase; Attempt merge onto ec7f568b13...

Updating ec7f568..172ee6d
Fast-forward
 conftest.py                                        |   1 +
 .../graph/utils/ListOriginContributors.java        | 151 +++++++
 .../org/softwareheritage/graph/utils/TopoSort.java | 134 ++++++
 mypy.ini                                           |   6 +
 requirements-luigi.txt                             |   2 +
 requirements-swh-luigi.txt                         |   2 +-
 requirements-swh.txt                               |   1 +
 swh/graph/luigi.py                                 | 474 ++++++++++++++++++++-
 .../dataset/compressed/example-labelled.labelobl   | Bin 0 -> 772 bytes
 .../compressed/example-labelled.labeloffsets       |   3 +-
 .../dataset/compressed/example-labelled.labels     |   2 +-
 .../dataset/compressed/example-labelled.properties |   2 +-
 .../example-transposed-labelled.labelobl           | Bin 0 -> 772 bytes
 .../example-transposed-labelled.labeloffsets       |   3 +-
 .../compressed/example-transposed-labelled.labels  |   3 +-
 .../example-transposed-labelled.properties         |   2 +-
 .../dataset/compressed/example-transposed.graph    |   2 +-
 .../dataset/compressed/example-transposed.obl      | Bin 772 -> 772 bytes
 .../dataset/compressed/example-transposed.offsets  |   3 +-
 .../compressed/example-transposed.properties       |  52 +--
 .../dataset/compressed/example.edges.count.txt     |   2 +-
 .../dataset/compressed/example.edges.stats.txt     |   8 +-
 swh/graph/tests/dataset/compressed/example.graph   |   2 +-
 .../tests/dataset/compressed/example.indegree      |   5 +-
 .../dataset/compressed/example.labels.count.txt    |   2 +-
 .../dataset/compressed/example.labels.csv.zst      | Bin 115 -> 131 bytes
 .../compressed/example.labels.fcl.bytearray        | Bin 110 -> 128 bytes
 .../dataset/compressed/example.labels.fcl.pointers | Bin 16 -> 24 bytes
 .../compressed/example.labels.fcl.properties       |   2 +-
 .../tests/dataset/compressed/example.labels.mph    | Bin 1521 -> 1529 bytes
 swh/graph/tests/dataset/compressed/example.mph     | Bin 961 -> 961 bytes
 .../dataset/compressed/example.node2swhid.bin      | Bin 462 -> 528 bytes
 .../tests/dataset/compressed/example.node2type.map | Bin 353 -> 361 bytes
 .../dataset/compressed/example.nodes.count.txt     |   2 +-
 .../tests/dataset/compressed/example.nodes.csv.zst | Bin 150 -> 181 bytes
 .../dataset/compressed/example.nodes.stats.txt     |   6 +-
 swh/graph/tests/dataset/compressed/example.obl     | Bin 772 -> 772 bytes
 swh/graph/tests/dataset/compressed/example.offsets |   4 +-
 swh/graph/tests/dataset/compressed/example.order   | Bin 168 -> 192 bytes
 .../tests/dataset/compressed/example.outdegree     |   4 +-
 .../tests/dataset/compressed/example.persons.mph   | Bin 961 -> 961 bytes
 .../tests/dataset/compressed/example.properties    |  50 +--
 .../compressed/example.property.author_id.bin      | Bin 84 -> 2112 bytes
 .../example.property.author_timestamp.bin          | Bin 168 -> 4224 bytes
 .../example.property.author_timestamp_offset.bin   | Bin 42 -> 1056 bytes
 .../compressed/example.property.committer_id.bin   | Bin 84 -> 2112 bytes
 .../example.property.committer_timestamp.bin       | Bin 168 -> 4224 bytes
 ...example.property.committer_timestamp_offset.bin | Bin 42 -> 1056 bytes
 .../example.property.content.is_skipped.bin        | Bin 85 -> 149 bytes
 .../compressed/example.property.content.length.bin | Bin 168 -> 4224 bytes
 .../compressed/example.property.message.bin        |   2 +
 .../compressed/example.property.message.offset.bin | Bin 168 -> 4224 bytes
 .../compressed/example.property.tag_name.bin       |   1 +
 .../example.property.tag_name.offset.bin           | Bin 168 -> 4224 bytes
 swh/graph/tests/dataset/compressed/example.stats   |  28 +-
 .../dataset/edges/origin/graph-all.edges.csv.zst   | Bin 82 -> 109 bytes
 .../dataset/edges/origin/graph-all.nodes.csv.zst   | Bin 64 -> 95 bytes
 .../dataset/edges/release/graph-all.edges.csv.zst  | Bin 56 -> 73 bytes
 .../dataset/edges/release/graph-all.nodes.csv.zst  | Bin 38 -> 42 bytes
 .../dataset/edges/snapshot/graph-all.edges.csv.zst | Bin 94 -> 128 bytes
 .../dataset/edges/snapshot/graph-all.nodes.csv.zst | Bin 33 -> 38 bytes
 swh/graph/tests/dataset/generate_dataset.py        |  46 +-
 swh/graph/tests/dataset/img/example.dot            |  13 +-
 .../tests/dataset/orc/content/content-all.orc      | Bin 1240 -> 1226 bytes
 .../tests/dataset/orc/directory/directory-all.orc  | Bin 578 -> 563 bytes
 .../orc/directory_entry/directory_entry-all.orc    | Bin 1126 -> 1115 bytes
 swh/graph/tests/dataset/orc/origin/origin-all.orc  | Bin 817 -> 935 bytes
 .../dataset/orc/origin_visit/origin_visit-all.orc  | Bin 898 -> 924 bytes
 .../origin_visit_status-all.orc                    | Bin 1150 -> 1191 bytes
 .../tests/dataset/orc/release/release-all.orc      | Bin 1361 -> 1407 bytes
 .../tests/dataset/orc/revision/revision-all.orc    | Bin 1658 -> 1643 bytes
 .../revision_extra_headers-all.orc                 | Bin 253 -> 236 bytes
 .../orc/revision_history/revision_history-all.orc  | Bin 700 -> 685 bytes
 .../orc/skipped_content/skipped_content-all.orc    | Bin 1177 -> 1160 bytes
 .../tests/dataset/orc/snapshot/snapshot-all.orc    | Bin 459 -> 456 bytes
 .../orc/snapshot_branch/snapshot_branch-all.orc    | Bin 865 -> 921 bytes
 swh/graph/tests/test_cli.py                        |   4 +-
 swh/graph/tests/test_grpc.py                       |   7 +-
 swh/graph/tests/test_http_client.py                |  18 +-
 swh/graph/tests/test_luigi.py                      |   4 +-
 swh/graph/tests/test_origin_contributors.py        | 186 ++++++++
 swh/graph/tests/test_toposort.py                   |  67 +++
 82 files changed, 1192 insertions(+), 114 deletions(-)
 create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/ListOriginContributors.java
 create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/TopoSort.java
 create mode 100644 swh/graph/tests/dataset/compressed/example-labelled.labelobl
 create mode 100644 swh/graph/tests/dataset/compressed/example-transposed-labelled.labelobl
 create mode 100644 swh/graph/tests/test_origin_contributors.py
 create mode 100644 swh/graph/tests/test_toposort.py
Changes applied before test
commit 172ee6deae3102f904284533b657003daf8c0b21
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Dec 1 11:35:43 2022 +0100

    ListOriginContributors: Ignore null author/committer in revisions/releases

commit ee09b16376dde6a033a4b6147237cdcfec3f081c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Dec 1 11:27:35 2022 +0100

    Regenerate the test dataset to include a release with no author
    
    This triggers a bug in ListOriginContributors, causing it to include
    "null" as a contributor.
    A future commit will fix this.

commit 9972a08685c3d6e45119494ee6404c66a6374f26
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Dec 1 10:39:09 2022 +0100

    Add ListOriginContributors
    
    This Java script (and related Luigi tasks) traverse the graph in
    topological order, building up the set of all contributors to a
    node and its ancestors, then dump the value of this set for every
    origin node they encounter.

commit 39fefbfc108087b4b7f86c39312d1f94f06cc16a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 29 17:54:30 2022 +0100

    Add Luigi task TopoSort and add a simple test

commit 78b4d9016cfd5025811607c9f6069fea1b39eb23
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 28 16:02:56 2022 +0100

    Improve comments

commit 0a651262c32ff3bca6951323a2ab9fe5e5204f97
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 24 16:15:04 2022 +0100

    Add a sample of two ancestor with each node
    
    This allows readers to efficiently get ancestors of nodes with low indegree
    (ie. most revisions), as it avoids a random access / API call.

commit 23f9256cd34f97bc3e6dd9eda51c07232f736e0f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 24 12:54:14 2022 +0100

    revert multithreading, it's actually twice as slow as singlethread

commit a62fa7f4b7c468ee7ef731986c7d7fc33c7f4042
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 24 12:06:21 2022 +0100

    tentative multithread DFS

commit ab744a8ada1de4cb6a9d3d904406f9e40d74a3db
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 24 11:49:32 2022 +0100

    Implement a naive topological sort

commit 550235e4e7a04f10e5c9869e5717b16ca5a2edf8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 29 17:01:45 2022 +0100

    luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/303/ for more details.

This revision is now accepted and ready to land.Dec 5 2022, 11:01 AM

Build was aborted

Patch application report for D8912 (id=32161)

Could not rebase; Attempt merge onto 0a8ae5de6f...

Updating 0a8ae5d..dfd4c1d
Fast-forward
 conftest.py                                        |   1 +
 .../graph/utils/ListOriginContributors.java        | 151 +++++++
 .../org/softwareheritage/graph/utils/TopoSort.java | 134 ++++++
 mypy.ini                                           |   6 +
 requirements-luigi.txt                             |   2 +
 requirements-swh-luigi.txt                         |   2 +-
 requirements-swh.txt                               |   1 +
 swh/graph/luigi.py                                 | 474 ++++++++++++++++++++-
 .../dataset/compressed/example-labelled.labelobl   | Bin 0 -> 772 bytes
 .../compressed/example-labelled.labeloffsets       |   3 +-
 .../dataset/compressed/example-labelled.labels     |   2 +-
 .../dataset/compressed/example-labelled.properties |   2 +-
 .../example-transposed-labelled.labelobl           | Bin 0 -> 772 bytes
 .../example-transposed-labelled.labeloffsets       |   3 +-
 .../compressed/example-transposed-labelled.labels  |   3 +-
 .../example-transposed-labelled.properties         |   2 +-
 .../dataset/compressed/example-transposed.graph    |   2 +-
 .../dataset/compressed/example-transposed.obl      | Bin 772 -> 772 bytes
 .../dataset/compressed/example-transposed.offsets  |   3 +-
 .../compressed/example-transposed.properties       |  52 +--
 .../dataset/compressed/example.edges.count.txt     |   2 +-
 .../dataset/compressed/example.edges.stats.txt     |   8 +-
 swh/graph/tests/dataset/compressed/example.graph   |   2 +-
 .../tests/dataset/compressed/example.indegree      |   5 +-
 .../dataset/compressed/example.labels.count.txt    |   2 +-
 .../dataset/compressed/example.labels.csv.zst      | Bin 115 -> 131 bytes
 .../compressed/example.labels.fcl.bytearray        | Bin 110 -> 128 bytes
 .../dataset/compressed/example.labels.fcl.pointers | Bin 16 -> 24 bytes
 .../compressed/example.labels.fcl.properties       |   2 +-
 .../tests/dataset/compressed/example.labels.mph    | Bin 1521 -> 1529 bytes
 swh/graph/tests/dataset/compressed/example.mph     | Bin 961 -> 961 bytes
 .../dataset/compressed/example.node2swhid.bin      | Bin 462 -> 528 bytes
 .../tests/dataset/compressed/example.node2type.map | Bin 353 -> 361 bytes
 .../dataset/compressed/example.nodes.count.txt     |   2 +-
 .../tests/dataset/compressed/example.nodes.csv.zst | Bin 150 -> 181 bytes
 .../dataset/compressed/example.nodes.stats.txt     |   6 +-
 swh/graph/tests/dataset/compressed/example.obl     | Bin 772 -> 772 bytes
 swh/graph/tests/dataset/compressed/example.offsets |   4 +-
 swh/graph/tests/dataset/compressed/example.order   | Bin 168 -> 192 bytes
 .../tests/dataset/compressed/example.outdegree     |   4 +-
 .../tests/dataset/compressed/example.persons.mph   | Bin 961 -> 961 bytes
 .../tests/dataset/compressed/example.properties    |  50 +--
 .../compressed/example.property.author_id.bin      | Bin 84 -> 2112 bytes
 .../example.property.author_timestamp.bin          | Bin 168 -> 4224 bytes
 .../example.property.author_timestamp_offset.bin   | Bin 42 -> 1056 bytes
 .../compressed/example.property.committer_id.bin   | Bin 84 -> 2112 bytes
 .../example.property.committer_timestamp.bin       | Bin 168 -> 4224 bytes
 ...example.property.committer_timestamp_offset.bin | Bin 42 -> 1056 bytes
 .../example.property.content.is_skipped.bin        | Bin 85 -> 149 bytes
 .../compressed/example.property.content.length.bin | Bin 168 -> 4224 bytes
 .../compressed/example.property.message.bin        |   2 +
 .../compressed/example.property.message.offset.bin | Bin 168 -> 4224 bytes
 .../compressed/example.property.tag_name.bin       |   1 +
 .../example.property.tag_name.offset.bin           | Bin 168 -> 4224 bytes
 swh/graph/tests/dataset/compressed/example.stats   |  28 +-
 .../dataset/edges/origin/graph-all.edges.csv.zst   | Bin 82 -> 109 bytes
 .../dataset/edges/origin/graph-all.nodes.csv.zst   | Bin 64 -> 95 bytes
 .../dataset/edges/release/graph-all.edges.csv.zst  | Bin 56 -> 73 bytes
 .../dataset/edges/release/graph-all.nodes.csv.zst  | Bin 38 -> 42 bytes
 .../dataset/edges/snapshot/graph-all.edges.csv.zst | Bin 94 -> 128 bytes
 .../dataset/edges/snapshot/graph-all.nodes.csv.zst | Bin 33 -> 38 bytes
 swh/graph/tests/dataset/generate_dataset.py        |  46 +-
 swh/graph/tests/dataset/img/example.dot            |  13 +-
 .../tests/dataset/orc/content/content-all.orc      | Bin 1240 -> 1226 bytes
 .../tests/dataset/orc/directory/directory-all.orc  | Bin 578 -> 563 bytes
 .../orc/directory_entry/directory_entry-all.orc    | Bin 1126 -> 1115 bytes
 swh/graph/tests/dataset/orc/origin/origin-all.orc  | Bin 817 -> 935 bytes
 .../dataset/orc/origin_visit/origin_visit-all.orc  | Bin 898 -> 924 bytes
 .../origin_visit_status-all.orc                    | Bin 1150 -> 1191 bytes
 .../tests/dataset/orc/release/release-all.orc      | Bin 1361 -> 1407 bytes
 .../tests/dataset/orc/revision/revision-all.orc    | Bin 1658 -> 1643 bytes
 .../revision_extra_headers-all.orc                 | Bin 253 -> 236 bytes
 .../orc/revision_history/revision_history-all.orc  | Bin 700 -> 685 bytes
 .../orc/skipped_content/skipped_content-all.orc    | Bin 1177 -> 1160 bytes
 .../tests/dataset/orc/snapshot/snapshot-all.orc    | Bin 459 -> 456 bytes
 .../orc/snapshot_branch/snapshot_branch-all.orc    | Bin 865 -> 921 bytes
 swh/graph/tests/test_cli.py                        |   4 +-
 swh/graph/tests/test_grpc.py                       |   7 +-
 swh/graph/tests/test_http_client.py                |  18 +-
 swh/graph/tests/test_luigi.py                      |   4 +-
 swh/graph/tests/test_origin_contributors.py        | 186 ++++++++
 swh/graph/tests/test_toposort.py                   |  67 +++
 82 files changed, 1192 insertions(+), 114 deletions(-)
 create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/ListOriginContributors.java
 create mode 100644 java/src/main/java/org/softwareheritage/graph/utils/TopoSort.java
 create mode 100644 swh/graph/tests/dataset/compressed/example-labelled.labelobl
 create mode 100644 swh/graph/tests/dataset/compressed/example-transposed-labelled.labelobl
 create mode 100644 swh/graph/tests/test_origin_contributors.py
 create mode 100644 swh/graph/tests/test_toposort.py
Changes applied before test
commit dfd4c1dc3b224477f9adb33c15f6c75bcdf78244
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Dec 1 11:35:43 2022 +0100

    ListOriginContributors: Ignore null author/committer in revisions/releases

commit 559d4068bfe1dd50d57062192c0e22664ada03c8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Dec 1 11:27:35 2022 +0100

    Regenerate the test dataset to include a release with no author
    
    This triggers a bug in ListOriginContributors, causing it to include
    "null" as a contributor.
    A future commit will fix this.

commit f3235e3184850b074b2a332686911688aafcdd84
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Dec 1 10:39:09 2022 +0100

    Add ListOriginContributors
    
    This Java script (and related Luigi tasks) traverse the graph in
    topological order, building up the set of all contributors to a
    node and its ancestors, then dump the value of this set for every
    origin node they encounter.

commit ab2703efcb9ad93a3d959596ed7edef27d908164
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 29 17:54:30 2022 +0100

    Add Luigi task TopoSort and add a simple test

commit 58f44785816bde0f6cdbf86e3ff6f1fbf385a487
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 28 16:02:56 2022 +0100

    Improve comments

commit 922894410b6e14f5a9eeec445d4a0b503df77a9e
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 24 16:15:04 2022 +0100

    Add a sample of two ancestor with each node
    
    This allows readers to efficiently get ancestors of nodes with low indegree
    (ie. most revisions), as it avoids a random access / API call.

commit 7bee5d47a6eb49ac594f2d019222c176373a5248
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 24 12:54:14 2022 +0100

    revert multithreading, it's actually twice as slow as singlethread

commit 30dad16a2365021bedf72df78d0753e125765016
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 24 12:06:21 2022 +0100

    tentative multithread DFS

commit ed6636c26be869a7309581d0ec664488b4d69e9f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Thu Nov 24 11:49:32 2022 +0100

    Implement a naive topological sort

commit b8dc411ccd304597df96d7dd36158fb86e5239fd
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 29 17:01:45 2022 +0100

    luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3

Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/314/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/314/console

This revision was landed with ongoing or failed builds.Dec 7 2022, 10:40 AM
This revision was automatically updated to reflect the committed changes.