Service providing fast access to an in-memory compressed graph representation of the Software Heritage archive.
The service is based on a representation of the Software Heritage Merkle DAG, compressed using the WebGraph framework.
Service providing fast access to an in-memory compressed graph representation of the Software Heritage archive.
The service is based on a representation of the Software Heritage Merkle DAG, compressed using the WebGraph framework.
GRPC update: cancelling a GRPC stream works fine, but it doesn't seem like aiohttp is doing that when the HTTP stream is closed.
We requested a larger quota here: https://github.com/pypa/pypi-support/issues/1998
We are migrating away from Debian packages as a deployment tool, closing this as WONTFIX.
We are migrating away from Debian packages as a deployment tool, closing this as WONTFIX.
No longer happens after the GRPC migration:
We removed the Py4J dependency by migrating to GRPC.
We no longer support multiple algorithms for shortest path requests.
Obsoleted by the migration to GRPC. Now we use GRPC's threading model, with a threadpool configurable by passing --threads to the Java service. By default, nproc is used.
Obsoleted by the migration to GRPC. We no longer create iterators on the decoded stream of a UNIX pipeline, we directly use GRPC stream iterators.
Fixed by the migration to GRPC.
Fixed in D8050
I will be solved by D7890
It's confirmed that the issue seems to be on the python part of the current implementation so I'm eager to see D7890 landed ;)
I reversed engineered the py4j communication protocol, so next time it will hang, we should be able to tell if the issue is on the gateway server side or on the python side:
mkfifo /tmp/test chmod a+w /tmp/test tail -F /tmp/test
ss -ltp | grep java <get the port number> telnet localhost <port number> c o0 get_handler s/tmp/test e
Seems to be this issue: https://sentry.softwareheritage.org/share/issue/b0b9753142404227a6d7421eb014fdb3/ (if it helps...)
So, as was mentioned during the irc discussion, one of the possible ways forward is to:
(4. doesn't work because libhadoop.so isn't packaged in maven)
For future reference, it looks like we are still "small" players as "big" packages go on PyPI: https://pypi.org/stats/ (e.g., tf-nightly is currently the largest package on PyPI and it weights 427 GiB).
While it is still not nice to ship a big fat JAR in a PyPI package, our extension requests will likely be granted.
We've asked for another bump at https://github.com/pypa/pypi-support/issues/1998.
IIRC, we cannot reduce the size; and I think it is unreasonable to ask PyPI for a higher limit.
Fixed in D7839
Build is green
Review fixes
Build is green
Review fixes
Thanks for all the reviews, I made a first pass with the easiest fixes.
Monumental documentation work, thanks!
I think this is generally great, and I've pointed out only some minor issues/suggestions here and there.
Still reviewing the rest but went through the quickstart. I wanted to submit these comments before it got too late over there.