Page MenuHomeSoftware Heritage

Add graph dataset reading classes (orc+edges)
ClosedPublic

Authored by seirl on Jan 21 2022, 7:05 PM.

Details

Summary

Adding new dataset reading classes to use the ORC dataset as a compression
input.

Diff Detail

Repository
rDGRPH Compressed graph representation
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D7021 (id=25446)

Rebasing onto 20787e69ba...

First, rewinding head to replay your work on top of it...
Applying: Add graph dataset reading classes (orc+edges)
Changes applied before test
commit 0518ea68ab3f995566a0b65bd554aa4579b3fe8f
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Wed Jan 19 17:38:08 2022 +0100

    Add graph dataset reading classes (orc+edges)

Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/161/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/161/console

Build has FAILED

Patch application report for D7021 (id=25447)

Rebasing onto 20787e69ba...

First, rewinding head to replay your work on top of it...
Applying: Add graph dataset reading classes (orc+edges)
Changes applied before test
commit e09c1f3fba94d892f9e228756940512fd0860b4c
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Wed Jan 19 17:38:08 2022 +0100

    Add graph dataset reading classes (orc+edges)

Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/162/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/162/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jan 21 2022, 7:08 PM
Harbormaster failed remote builds in B26286: Diff 25447!
seirl edited the summary of this revision. (Show Details)

Finalize, ready for review

Build is green

Patch application report for D7021 (id=25537)

Rebasing onto 20787e69ba...

First, rewinding head to replay your work on top of it...
Applying: Add graph dataset reading classes (orc+edges)
Changes applied before test
commit 37f1926d931eb4927399e04a6b387b64ffa9e498
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Wed Jan 19 17:38:08 2022 +0100

    Add graph dataset reading classes (orc+edges)

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/163/ for more details.

seirl requested review of this revision.Jan 26 2022, 7:21 PM
olasd added inline comments.
java/src/main/java/org/softwareheritage/graph/Node.java
92–121

Why aren't you using the map?

vlorentz added inline comments.
java/src/main/java/org/softwareheritage/graph/Node.java
92–121

"performance-critical deserialization", I assume

102

uh?

Build is green

Patch application report for D7021 (id=25542)

Rebasing onto 20787e69ba...

First, rewinding head to replay your work on top of it...
Applying: Add graph dataset reading classes (orc+edges)
Changes applied before test
commit dba3848b0ad334877254cfdba3208d711527053f
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Wed Jan 19 17:38:08 2022 +0100

    Add graph dataset reading classes (orc+edges)

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/164/ for more details.

Build is green

Patch application report for D7021 (id=25543)

Rebasing onto 20787e69ba...

First, rewinding head to replay your work on top of it...
Applying: Add graph dataset reading classes (orc+edges)
Changes applied before test
commit bd1d178755d3d2c78a1d15c707b25db2e1095b07
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Wed Jan 19 17:38:08 2022 +0100

    Add graph dataset reading classes (orc+edges)

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/165/ for more details.

This revision is now accepted and ready to land.Jan 28 2022, 6:00 PM

a few minor remarks

java/src/main/java/org/softwareheritage/graph/Node.java
101

this method looks called massively, perhaps caching the byte arrays in static variables could be more efficient as the getBytes calls will make encoding stuff at each call
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L1797

(don't know how the jit will optimize it)

java/src/main/java/org/softwareheritage/graph/compress/ExtractNodes.java
68

Maybe stop here to avoid an NPE later ?

java/src/main/java/org/softwareheritage/graph/compress/ORCGraphDataset.java
172

Magic number ? ;)

Update from review comments

This revision was landed with ongoing or failed builds.Feb 2 2022, 4:56 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D7021 (id=25641)

Rebasing onto 20787e69ba...

First, rewinding head to replay your work on top of it...
Applying: Add graph dataset reading classes (orc+edges)
Changes applied before test
commit 2044a5a6e4b57fa405733f350388215313d7d045
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Wed Jan 19 17:38:08 2022 +0100

    Add graph dataset reading classes (orc+edges)

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/166/ for more details.

Build is green

Patch application report for D7021 (id=25642)

Rebasing onto 20787e69ba...

Current branch diff-target is up to date.
Changes applied before test
commit 2d9529b20d4ea2eba8b3a90851f3834e9943e574
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Wed Jan 19 17:38:08 2022 +0100

    Add graph dataset reading classes (orc+edges)

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/167/ for more details.