this has been merged into master with commit 9cece1f6722ce836cec9353b928bb4bb4b7b77e6
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Nov 18 2019
Nov 16 2019
this has been merged into master in commit 40daed1eaa06da82afd14652200b903c807c34ae
upon review, /last only makes sense for /walk and /randomwalk endpoints, for other endpoints it doesn't, as it will return arbitrary results (e.g., what's the point of knowing the last neighbor of one node, given they're in arbitrary order?)
Nov 13 2019
Nov 12 2019
Another simple way to reproduce is just removing the *.jar file and running pytest on test_api_client.py.
This is not even a Java exception, but chances are fixing that case will fix at least a significant part of the general problem, if not all.
Nov 11 2019
AFAICT this is a more general problem, the Java backend can hang forever in case of unexpected situations (uncaught exceptions? I really don't know…), which will make it not respond to any incoming request with no visible output.
We should make this visible and debuggable.
initial skeleton (not yet working) in rDGRPHbc368c1775e6, branch feature/random-walk
Nov 9 2019
This was actually a false alarm. Due to the lack of --separate in the build toolchain for the entire docs.s.o, submodules (as opposed to sub-*packages*) of swh.graph were not visible in the TOC and only visible by scrolling down the page. I've fixed this with 6547df80508fa8d467475a8ca8db307ceb2f9972 in swh-docs.
Nov 8 2019
this is now done in the aiohttp server, which says (with links):
Nov 7 2019
Closed in rDGRPH998a44353612
Closed in 6d2f04b4d5a4
Probably not. I'm working on adding support for other objects.
In T2053#38352, @vlorentz wrote:Added parallelism. 450k/s with 16 workers and no compression. I won't try with 32 workers because Python processes would use too much CPU on my machine.
Added parallelism. 450k/s with 16 workers and no compression. I won't try with 32 workers because Python processes would use too much CPU on my machine.
Throughput improved to 34k/s just by not querying unneeded fields.
Nov 5 2019
reopen, as it's not fixed in master yet
reopen, as it's not closed in master yet
Looks good, thanks !
Nov 4 2019
I wrote a prototype for exporting revisions: https://forge.softwareheritage.org/source/snippets/browse/master/vlorentz/cassandra_stream_graph.py
The .jar file is never installed within the tox environment, so the graph backend process fixture never actually succeeds in launching the server. FWIW, when running tox on my system, the tests hang just the same.
Nov 3 2019
now that we have the swh graph compress CLI, we're moving away to using docker for automated compression, so this has become moot
Oct 31 2019
Sep 13 2019
Neither of the two spectrum endpoints "fully sort in RAM then write sequentially" and "write randomly" is satisfactory here.
What we want is: in memory sorting within the limits allowed by available RAM + swapon/swapoff of partially sorted subsets + sequential write at the end.
We can implement this in Java in the Setup class, but, in fact, that is exactly what /usr/bin/sort is good at doing. So I propose to shell out to it from Setup and serialize sort result to a writer for the binary format of T1944.
Status update: we have now binary serialization formats for the two maps, see docstrings of PidToIntMap and IntToPidMap in swh.graph.pid
That means that Python code can read the compact maps (and also write them, but at a speed that is impractical for generation). Conversion of the textual maps generated for the most recent compressed graph is ongoing and almost completed.
Aug 27 2019
Aug 26 2019
Aug 25 2019
The swh-graph repo is now fully integrated and has CI
This was fixed with in 87192dfddd4b by using a hash map. See T1969 for long term solution.
Done, see latex report in https://forge.softwareheritage.org/source/swh-graph/browse/master/reports/benchmarks/benchmarks.tex
However in swh-docs the java version is 8 which is not compatible to generate the javadoc (it should be >= 9), see https://jenkins.softwareheritage.org/view/all/job/DDOC/job/publish/lastFailedBuild/console
Aug 24 2019
I recompiled from scratch the Java server, rebooted the Azure vm and everything works as expected, no more query hanging.
Aug 23 2019
Aug 16 2019
Aug 15 2019
Aug 14 2019
From Javalin documentation [1]:
Both big arrays are meant to be used with all the graph nodes, here is their RAM usage: