Page MenuHomeSoftware Heritage

Add luigi task to compress the graph
ClosedPublic

Authored by vlorentz on Nov 28 2022, 4:33 PM.

Diff Detail

Repository
rDGRPH Compressed graph representation
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vlorentz edited the summary of this revision. (Show Details)
vlorentz edited the summary of this revision. (Show Details)

Build has FAILED

Patch application report for D8891 (id=32045)

Rebasing onto 6572c43136...

Current branch diff-target is up to date.
Changes applied before test
commit 956dbc43624f441fc727a02ce4ae3a9639198d19
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 28 16:33:01 2022 +0100

    Add luigi task to compress the graph

Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/282/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/282/console

Harbormaster returned this revision to the author for changes because remote builds failed.Nov 28 2022, 4:35 PM
Harbormaster failed remote builds in B32994: Diff 32045!

Build has FAILED

Patch application report for D8891 (id=32047)

Could not rebase; Attempt merge onto 6572c43136...

Updating 6572c43..ae852b6
Fast-forward
 mypy.ini                                           |   3 +
 requirements-luigi.txt                             |   1 +
 requirements-swh-luigi.txt                         |   1 +
 swh/graph/luigi.py                                 | 198 +++++++++++++++++++++
 .../compressed/example-labelled.labeloffsets       |   0
 .../dataset/compressed/example-labelled.labels     |   0
 .../dataset/compressed/example-labelled.properties |   0
 .../example-transposed-labelled.labeloffsets       |   0
 .../compressed/example-transposed-labelled.labels  |   0
 .../example-transposed-labelled.properties         |   0
 .../dataset/compressed/example-transposed.graph    |   0
 .../dataset/compressed/example-transposed.obl      | Bin
 .../dataset/compressed/example-transposed.offsets  |   0
 .../compressed/example-transposed.properties       |   0
 .../dataset/compressed/example.edges.count.txt     |   0
 .../dataset/compressed/example.edges.stats.txt     |   0
 .../{ => data}/dataset/compressed/example.graph    |   0
 .../{ => data}/dataset/compressed/example.indegree |   0
 .../dataset/compressed/example.labels.count.txt    |   0
 .../dataset/compressed/example.labels.csv.zst      | Bin
 .../compressed/example.labels.fcl.bytearray        | Bin
 .../dataset/compressed/example.labels.fcl.pointers | Bin
 .../compressed/example.labels.fcl.properties       |   0
 .../dataset/compressed/example.labels.mph          | Bin
 .../{ => data}/dataset/compressed/example.mph      | Bin
 .../dataset/compressed/example.node2swhid.bin      | Bin
 .../dataset/compressed/example.node2type.map       | Bin
 .../dataset/compressed/example.nodes.count.txt     |   0
 .../dataset/compressed/example.nodes.csv.zst       | Bin
 .../dataset/compressed/example.nodes.stats.txt     |   0
 .../{ => data}/dataset/compressed/example.obl      | Bin
 .../{ => data}/dataset/compressed/example.offsets  |   0
 .../{ => data}/dataset/compressed/example.order    | Bin
 .../dataset/compressed/example.outdegree           |   0
 .../dataset/compressed/example.persons.count.txt   |   0
 .../dataset/compressed/example.persons.csv.zst     | Bin
 .../dataset/compressed/example.persons.mph         | Bin
 .../dataset/compressed/example.properties          |   0
 .../compressed/example.property.author_id.bin      | Bin
 .../example.property.author_timestamp.bin          | Bin
 .../example.property.author_timestamp_offset.bin   | Bin
 .../compressed/example.property.committer_id.bin   | Bin
 .../example.property.committer_timestamp.bin       | Bin
 ...example.property.committer_timestamp_offset.bin | Bin
 .../example.property.content.is_skipped.bin        | Bin
 .../compressed/example.property.content.length.bin | Bin
 .../compressed/example.property.message.bin        |   0
 .../compressed/example.property.message.offset.bin | Bin
 .../compressed/example.property.tag_name.bin       |   0
 .../example.property.tag_name.offset.bin           | Bin
 .../{ => data}/dataset/compressed/example.stats    |   0
 .../dataset/edges/content/graph-all.edges.csv.zst  | Bin
 .../dataset/edges/content/graph-all.nodes.csv.zst  | Bin
 .../edges/directory/graph-all.edges.csv.zst        | Bin
 .../edges/directory/graph-all.nodes.csv.zst        | Bin
 .../dataset/edges/origin/graph-all.edges.csv.zst   | Bin
 .../dataset/edges/origin/graph-all.nodes.csv.zst   | Bin
 .../dataset/edges/release/graph-all.edges.csv.zst  | Bin
 .../dataset/edges/release/graph-all.nodes.csv.zst  | Bin
 .../dataset/edges/revision/graph-all.edges.csv.zst | Bin
 .../dataset/edges/revision/graph-all.nodes.csv.zst | Bin
 .../dataset/edges/snapshot/graph-all.edges.csv.zst | Bin
 .../dataset/edges/snapshot/graph-all.nodes.csv.zst | Bin
 .../tests/{ => data}/dataset/generate_dataset.py   |   0
 swh/graph/tests/{ => data}/dataset/img/.gitignore  |   0
 swh/graph/tests/{ => data}/dataset/img/Makefile    |   0
 swh/graph/tests/{ => data}/dataset/img/example.dot |   0
 swh/graph/tests/data/dataset/meta/export.json      |  13 ++
 .../{ => data}/dataset/orc/content/content-all.orc | Bin
 .../dataset/orc/directory/directory-all.orc        | Bin
 .../orc/directory_entry/directory_entry-all.orc    | Bin
 .../{ => data}/dataset/orc/origin/origin-all.orc   | Bin
 .../dataset/orc/origin_visit/origin_visit-all.orc  | Bin
 .../origin_visit_status-all.orc                    | Bin
 .../{ => data}/dataset/orc/release/release-all.orc | Bin
 .../dataset/orc/revision/revision-all.orc          | Bin
 .../revision_extra_headers-all.orc                 | Bin
 .../orc/revision_history/revision_history-all.orc  | Bin
 .../orc/skipped_content/skipped_content-all.orc    | Bin
 .../dataset/orc/snapshot/snapshot-all.orc          | Bin
 .../orc/snapshot_branch/snapshot_branch-all.orc    | Bin
 swh/graph/tests/test_cli.py                        |   6 +-
 swh/graph/tests/test_luigi.py                      |  39 ++++
 83 files changed, 257 insertions(+), 4 deletions(-)
 create mode 100644 requirements-luigi.txt
 create mode 100644 requirements-swh-luigi.txt
 create mode 100644 swh/graph/luigi.py
 rename swh/graph/tests/{ => data}/dataset/compressed/example-labelled.labeloffsets (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example-labelled.labels (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example-labelled.properties (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example-transposed-labelled.labeloffsets (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example-transposed-labelled.labels (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example-transposed-labelled.properties (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example-transposed.graph (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example-transposed.obl (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example-transposed.offsets (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example-transposed.properties (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.edges.count.txt (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.edges.stats.txt (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.graph (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.indegree (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.labels.count.txt (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.labels.csv.zst (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.labels.fcl.bytearray (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.labels.fcl.pointers (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.labels.fcl.properties (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.labels.mph (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.mph (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.node2swhid.bin (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.node2type.map (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.nodes.count.txt (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.nodes.csv.zst (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.nodes.stats.txt (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.obl (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.offsets (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.order (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.outdegree (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.persons.count.txt (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.persons.csv.zst (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.persons.mph (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.properties (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.property.author_id.bin (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.property.author_timestamp.bin (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.property.author_timestamp_offset.bin (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.property.committer_id.bin (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.property.committer_timestamp.bin (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.property.committer_timestamp_offset.bin (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.property.content.is_skipped.bin (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.property.content.length.bin (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.property.message.bin (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.property.message.offset.bin (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.property.tag_name.bin (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.property.tag_name.offset.bin (100%)
 rename swh/graph/tests/{ => data}/dataset/compressed/example.stats (100%)
 rename swh/graph/tests/{ => data}/dataset/edges/content/graph-all.edges.csv.zst (100%)
 rename swh/graph/tests/{ => data}/dataset/edges/content/graph-all.nodes.csv.zst (100%)
 rename swh/graph/tests/{ => data}/dataset/edges/directory/graph-all.edges.csv.zst (100%)
 rename swh/graph/tests/{ => data}/dataset/edges/directory/graph-all.nodes.csv.zst (100%)
 rename swh/graph/tests/{ => data}/dataset/edges/origin/graph-all.edges.csv.zst (100%)
 rename swh/graph/tests/{ => data}/dataset/edges/origin/graph-all.nodes.csv.zst (100%)
 rename swh/graph/tests/{ => data}/dataset/edges/release/graph-all.edges.csv.zst (100%)
 rename swh/graph/tests/{ => data}/dataset/edges/release/graph-all.nodes.csv.zst (100%)
 rename swh/graph/tests/{ => data}/dataset/edges/revision/graph-all.edges.csv.zst (100%)
 rename swh/graph/tests/{ => data}/dataset/edges/revision/graph-all.nodes.csv.zst (100%)
 rename swh/graph/tests/{ => data}/dataset/edges/snapshot/graph-all.edges.csv.zst (100%)
 rename swh/graph/tests/{ => data}/dataset/edges/snapshot/graph-all.nodes.csv.zst (100%)
 rename swh/graph/tests/{ => data}/dataset/generate_dataset.py (100%)
 rename swh/graph/tests/{ => data}/dataset/img/.gitignore (100%)
 rename swh/graph/tests/{ => data}/dataset/img/Makefile (100%)
 rename swh/graph/tests/{ => data}/dataset/img/example.dot (100%)
 create mode 100644 swh/graph/tests/data/dataset/meta/export.json
 rename swh/graph/tests/{ => data}/dataset/orc/content/content-all.orc (100%)
 rename swh/graph/tests/{ => data}/dataset/orc/directory/directory-all.orc (100%)
 rename swh/graph/tests/{ => data}/dataset/orc/directory_entry/directory_entry-all.orc (100%)
 rename swh/graph/tests/{ => data}/dataset/orc/origin/origin-all.orc (100%)
 rename swh/graph/tests/{ => data}/dataset/orc/origin_visit/origin_visit-all.orc (100%)
 rename swh/graph/tests/{ => data}/dataset/orc/origin_visit_status/origin_visit_status-all.orc (100%)
 rename swh/graph/tests/{ => data}/dataset/orc/release/release-all.orc (100%)
 rename swh/graph/tests/{ => data}/dataset/orc/revision/revision-all.orc (100%)
 rename swh/graph/tests/{ => data}/dataset/orc/revision_extra_headers/revision_extra_headers-all.orc (100%)
 rename swh/graph/tests/{ => data}/dataset/orc/revision_history/revision_history-all.orc (100%)
 rename swh/graph/tests/{ => data}/dataset/orc/skipped_content/skipped_content-all.orc (100%)
 rename swh/graph/tests/{ => data}/dataset/orc/snapshot/snapshot-all.orc (100%)
 rename swh/graph/tests/{ => data}/dataset/orc/snapshot_branch/snapshot_branch-all.orc (100%)
 create mode 100644 swh/graph/tests/test_luigi.py
Changes applied before test
commit ae852b67246978eb887a23835c5e0d23674ae46d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 28 16:37:30 2022 +0100

    Add luigi task to compress the graph

commit d9035888a81a620cf7cd542faed55551f81b3ab4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 28 16:39:55 2022 +0100

    Move test dataset to the standard datadir location

Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/284/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/284/console

Harbormaster returned this revision to the author for changes because remote builds failed.Nov 28 2022, 4:50 PM
Harbormaster failed remote builds in B32996: Diff 32047!

give up on changing the test dataset path

Build has FAILED

Patch application report for D8891 (id=32049)

Rebasing onto 6572c43136...

Current branch diff-target is up to date.
Changes applied before test
commit 09ffd126f3b7c74dda54719c400b174d040172aa
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 28 17:13:52 2022 +0100

    Add luigi task to compress the graph

Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/285/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/285/console

Harbormaster returned this revision to the author for changes because remote builds failed.Nov 28 2022, 5:17 PM
Harbormaster failed remote builds in B32998: Diff 32049!

tests will pass when swh-dataset is released

  • luigi: Simplify compression task

Build has FAILED

Patch application report for D8891 (id=32058)

Rebasing onto 6572c43136...

Current branch diff-target is up to date.
Changes applied before test
commit 95cdad3505d45360037bdd1840bd1d02ac0088bf
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 29 11:05:53 2022 +0100

    luigi: Simplify compression task

commit 09ffd126f3b7c74dda54719c400b174d040172aa
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 28 17:13:52 2022 +0100

    Add luigi task to compress the graph

Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/286/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/286/console

fix tox.ini + remove option to configure graph name

Build has FAILED

Patch application report for D8891 (id=32059)

Rebasing onto 6572c43136...

Current branch diff-target is up to date.
Changes applied before test
commit 074fd71f9ee52acd586f0cbd752f0d2bfe2573b9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 29 11:12:13 2022 +0100

    luigi: Remove option to configure graph name
    
    It's hard to handle correctly, and it does not seem useful to use
    outside the compression pipeline.
    
    In practice, we use separate directories for this.

commit 1c35a5ef45a8edac03cf5be5e547fd6e66fb9282
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 29 11:05:53 2022 +0100

    luigi: Simplify compression task

commit 540acc470fca45556fdd4f38f44834a976b593c3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 28 17:13:52 2022 +0100

    Add luigi task to compress the graph

Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/287/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/287/console

Build is green

Patch application report for D8891 (id=32060)

Rebasing onto 6572c43136...

Current branch diff-target is up to date.
Changes applied before test
commit 0a4b706400d4dd921ebcd2afb7aca8bf4f12f541
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 29 11:12:13 2022 +0100

    luigi: Remove option to configure graph name
    
    It's hard to handle correctly, and it does not seem useful to use
    outside the compression pipeline.
    
    In practice, we use separate directories for this.

commit 2b634bf4192dd9194f1e21e7c2c39dec47cf94d1
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Nov 29 11:05:53 2022 +0100

    luigi: Simplify compression task

commit 60e4707feca6d8677035dc5f20dd11b5f0876402
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 28 17:13:52 2022 +0100

    Add luigi task to compress the graph

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/288/ for more details.

This revision is now accepted and ready to land.Nov 29 2022, 11:51 AM