Page MenuHomeSoftware Heritage

Reimplement REST API in Python with Py4J + aiohttp
ClosedPublic

Authored by seirl on Sep 17 2019, 2:18 AM.

Diff Detail

Repository
rDGRPH Compressed graph representation
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

zack requested changes to this revision.Sep 17 2019, 9:17 AM

In addition to the minor comment above, a few bare-bone tests that spawn the aiohttp daemon and try to actually query would be really nice. The toolchain has several moving parts, I can easily see it breaking unexpectedly.

@haltode: I didn't look much at the Java side, can you please have a look when you have a moment? TIA

swh/graph/server/__main__.py
42 ↗(On Diff #6705)

So, what is the return format here?
Content-Type says application/json, but the PIDs are streamed as newline-separated.

Is this line-delimited JSON, and would it actually work with common JSON tools, e.g., jq?

I'm fine with any approach to stream JSON, but we should make sure it is not an ad hoc format hard to use in practice.

swh/graph/server/backend.py
32–33 ↗(On Diff #6705)

these suffixes should be constants

35 ↗(On Diff #6705)

left-over comment

123 ↗(On Diff #6705)

A prefix="swh-graph." for TemporaryDirectory would be nice here. (Yes, it's already in the FIFO name later, but having it in the dir name would be nice to ls /tmp users.

This revision now requires changes to proceed.Sep 17 2019, 9:17 AM

FWIW, Jenkins fails on this diff with:

Checking patch java/server/src/main/java/org/softwareheritage/graph/Entry.java...
error: java/server/src/main/java/org/softwareheritage/graph/Entry.java: does not exist in index

Looks like a spurious CI error?

Yes, the CI cherrypicked the commit with an incorrect base.

swh/graph/server/__main__.py
42 ↗(On Diff #6705)

I think you might have misread the code, I don't set the Content-Type to application/json anywhere for these views.

zack added inline comments.
swh/graph/server/__main__.py
42 ↗(On Diff #6705)

oh it was only for stats indeed, my bad

This revision is now accepted and ready to land.Sep 19 2019, 9:37 AM
java/server/src/main/java/org/softwareheritage/graph/Entry.java
30–32 ↗(On Diff #6746)

We need the snake case conversion to be applied to *all* endpoints, not only the stats. For example in traversal endpoints you have the nbEdgesAccessed metadata that needs to be transformed in snake case.

  • Move example dataset in common tests/dataset directory, add binary maps
  • Server API: handle visits of paths
  • Server API: reorganize app/main
  • API client: update to use the new streaming lines format
  • Update tests to work with the new Python server

java: remove useless StreamTraversal.java

zack requested changes to this revision.Sep 27 2019, 8:28 PM

Nice work!

I've only commented on minor stuff here and there.

java/server/src/main/java/org/softwareheritage/graph/Entry.java
67 ↗(On Diff #6852)

a named constant would be nice(r) than -1

75 ↗(On Diff #6852)

The FIFO is created by the client, right? (if not, we have a race condition…)

If so this message should be something like "Cannot open client FIFO".

30–32 ↗(On Diff #6746)

Ping on this comment by @haltode. I think you two discussed it on IRC after it, but I'm not 100% sure.

swh/graph/cli.py
109

(nice catch, thanks)

swh/graph/server/app.py
1 ↗(On Diff #6852)

the usual copyright/license header is missing here

… and while we are at it, a brief docstring stating that this is a proxy server talking to java via py4j + fifo would be welcome too.

16 ↗(On Diff #6852)

maybe add a link to https://docs.softwareheritage.org/devel/swh-graph/api.html in the response body?

swh/graph/server/backend.py
51 ↗(On Diff #6852)

I've seen this list of simple traversal method names at least twice. Would be nice to have a constant with the list in one of the two module, and used in both cases.

55 ↗(On Diff #6852)

… oh, look, here's another place where you can reuse the path separator constant :-)

74 ↗(On Diff #6852)

we have this list in swh.model.identifiers.PID_TYPES

This revision now requires changes to proceed.Sep 27 2019, 8:28 PM
seirl added inline comments.
java/server/src/main/java/org/softwareheritage/graph/Entry.java
75 ↗(On Diff #6852)

Yes, good catch.

30–32 ↗(On Diff #6746)

Yes, this comment is not applicable to my code because I no longer return these metadata in the streaming APIs

swh/graph/server/__main__.py
15 ↗(On Diff #6852)

Even if you don't personally like click, it is the standard in the swh codebase and uniformity rocks.
So please drop argparse in favor of click. (Sorry.)

18 ↗(On Diff #6852)

Given the multi-file on disk representation of compressed graph, users will have no idea what they should pass here. The help strings should be expanded to explain that.

I've tried passing the basename (e.g., all, for a graph stored as all.{compressed,obl,…}, but it didn't work and there was no meaningful error message.

swh/graph/server/__main__.py
15 ↗(On Diff #6852)

Oh, and in fact this main.py should be branched as a subcommand of the existing swh graph in cli.py.
Other modules use swh MODULE rpc-serve for similar needs.

seirl marked 12 inline comments as done.Sep 30 2019, 5:21 PM
  • server: proper JAR deployment method
  • server: refactor some constants
  • server: refactor handler proxying to backed
  • server: move serve command to cli.py
zack requested changes to this revision.Sep 30 2019, 8:20 PM

looks great ! I've only noticed minor stuff (this time for real ;-))

Makefile.local
4 ↗(On Diff #6863)

maybe it's just phabricator playing tricks here, but I see two spaces while it should be 1 TAB

setup.py
39 ↗(On Diff #6863)

I'm a bit worried about what will happen if multiple *.jar files, with different version numbers (e.g., due to previously built versions), are around.

Does mvn compile ensure that previous versions will be removed? Is there a way to make it so?

swh/graph/cli.py
130

looks like a copy-paste error in the port number, it should be 5009

swh/graph/server/app.py
16 ↗(On Diff #6852)

this is marked as done but it's not actually done in the diff (?)

This revision now requires changes to proceed.Sep 30 2019, 8:20 PM
seirl marked an inline comment as done.
  • cli: fix rpc-serve default port
  • api client: handle stream decoding in swh.graph
  • api server: add links to API doc to index
Makefile.local
4 ↗(On Diff #6863)

Yes, it's already a tab.

setup.py
39 ↗(On Diff #6863)

Here we just install everything. I think the proper way of doing that, theoretically, is to sync the versions between maven and the pypi package through a version.txt or similar, and then just load this one. I can investigate how to do that with our current setup.

This revision is now accepted and ready to land.Oct 1 2019, 10:07 AM
This revision was landed with ongoing or failed builds.Oct 1 2019, 10:10 AM
This revision was automatically updated to reflect the committed changes.