Page MenuHomeSoftware Heritage

Move SWHID<->node ID conversion in the Java backend
ClosedPublic

Authored by seirl on Tue, Nov 23, 5:28 PM.

Details

Summary

Doing the SWHID <-> node ID conversion in the Python side prevents using
the minimal perfect hash function for SWHID -> node ID. This design is
inherited from when we thought we could have a really thin Java layer
and write all the code in Python, but this turned out to be infeasible
in practice.

Instead, we are instead moving more and more things in the Java layer,
until ultimately the entire Python RPC API might be replaced by a pure
Java implementation.

This commit drops the swh/graph/graph.py file for the same reasons
outlined above (it's infeasible to write graph traversal algorithms in
pure Python), along with its associated tests.

Diff Detail

Repository
rDGRPH Compressed graph representation
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D6676 (id=24257)

Rebasing onto c29ee2787e...

Current branch diff-target is up to date.
Changes applied before test
commit 117fda901247d78dc91ab8c897e79d0573a45c3a
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Tue Nov 23 17:22:00 2021 +0100

    Move SWHID<->node ID conversion in the Java backend
    
    Doing the SWHID <-> node ID conversion in the Python side prevents using
    the minimal perfect hash function for SWHID -> node ID. This design is
    inherited from when we thought we could have a really thin Java layer
    and write all the code in Python, but this turned out to be infeasible
    in practice.
    
    Instead, we are instead moving more and more things in the Java layer,
    until ultimately the entire Python RPC API might be replaced by a pure
    Java implementation.
    
    This commit drops the swh/graph/graph.py file for the same reasons
    outlined above (it's infeasible to write graph traversal algorithms in
    pure Python), along with its associated tests.

Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/139/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/139/console

Harbormaster returned this revision to the author for changes because remote builds failed.Tue, Nov 23, 5:28 PM
Harbormaster failed remote builds in B25134: Diff 24257!

Build is green

Patch application report for D6676 (id=24258)

Rebasing onto c29ee2787e...

Current branch diff-target is up to date.
Changes applied before test
commit aa8a578f4891869acf70fd2d3524d411fea7310e
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Tue Nov 23 17:22:00 2021 +0100

    Move SWHID<->node ID conversion in the Java backend
    
    Doing the SWHID <-> node ID conversion in the Python side prevents using
    the minimal perfect hash function for SWHID -> node ID. This design is
    inherited from when we thought we could have a really thin Java layer
    and write all the code in Python, but this turned out to be infeasible
    in practice.
    
    Instead, we are instead moving more and more things in the Java layer,
    until ultimately the entire Python RPC API might be replaced by a pure
    Java implementation.
    
    This commit drops the swh/graph/graph.py file for the same reasons
    outlined above (it's infeasible to write graph traversal algorithms in
    pure Python), along with its associated tests.

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/140/ for more details.

seirl requested review of this revision.Tue, Nov 23, 5:35 PM
vlorentz added inline comments.
java/src/main/java/org/softwareheritage/graph/Entry.java
158–166

silent fallback on invalid dst? why?

177–185

ditto

java/src/main/java/org/softwareheritage/graph/Entry.java
158–166

It should be new SWHID(dst). It's very weird that the tests didn't catch that...

Fix src/dst inversion, add regression test

Build is green

Patch application report for D6676 (id=24325)

Rebasing onto 32bab89d44...

Current branch diff-target is up to date.
Changes applied before test
commit 0b33cff0d228876964144662d4968d906b32b856
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Tue Nov 23 17:22:00 2021 +0100

    Move SWHID<->node ID conversion in the Java backend
    
    Doing the SWHID <-> node ID conversion in the Python side prevents using
    the minimal perfect hash function for SWHID -> node ID. This design is
    inherited from when we thought we could have a really thin Java layer
    and write all the code in Python, but this turned out to be infeasible
    in practice.
    
    Instead, we are instead moving more and more things in the Java layer,
    until ultimately the entire Python RPC API might be replaced by a pure
    Java implementation.
    
    This commit drops the swh/graph/graph.py file for the same reasons
    outlined above (it's infeasible to write graph traversal algorithms in
    pure Python), along with its associated tests.

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/141/ for more details.

This revision is now accepted and ready to land.Fri, Nov 26, 4:55 PM