Page MenuHomeSoftware Heritage
Feed Advanced Search

Nov 18 2019

zack closed T1969: graph: reduce RAM usage for /walk as Resolved by committing rDGRPH3c3f004352f8: REST API: disable /walk.
Nov 18 2019, 1:38 PM · Compressed graph service
zack added a revision to T1969: graph: reduce RAM usage for /walk: D2296: REST API: disable /walk.
Nov 18 2019, 1:26 PM · Compressed graph service

Nov 16 2019

zack closed T1888: graph API documentation: clarify the relationship between directory=backward and edges= as Resolved by committing rDGRPHacb78fb9051a: REST API doc: clarify edge restriciton semantics for the backward graph.
Nov 16 2019, 5:02 PM · Documentation, Compressed graph service
zack changed the status of T1888: graph API documentation: clarify the relationship between directory=backward and edges= from Open to Work in Progress.
Nov 16 2019, 3:41 PM · Documentation, Compressed graph service
zack added a revision to T1888: graph API documentation: clarify the relationship between directory=backward and edges=: D2290: REST API doc: clarify edge restriciton semantics for the backward graph.
Nov 16 2019, 3:41 PM · Documentation, Compressed graph service
zack closed T2072: common configuration file for swh graph rpc-serve, compress, … as Resolved.

this has been merged into master with commit 9cece1f6722ce836cec9353b928bb4bb4b7b77e6

Nov 16 2019, 3:23 PM · Compressed graph service
zack closed T2077: add random walk endpoint with limited retries as Resolved.

this has been merged into master in commit 40daed1eaa06da82afd14652200b903c807c34ae

Nov 16 2019, 3:22 PM · Compressed graph service
zack triaged T2096: CNAME for graph service: graph.internal.softwareheritage.org (?) as Low priority.
Nov 16 2019, 3:21 PM · Compressed graph service, System administration
zack changed the status of T2084: swh-graph: add /last endpoint variants to the REST API from Open to Work in Progress.

upon review, /last only makes sense for /walk and /randomwalk endpoints, for other endpoints it doesn't, as it will return arbitrary results (e.g., what's the point of knowing the last neighbor of one node, given they're in arbitrary order?)

Nov 16 2019, 3:15 PM · Compressed graph service
zack added a revision to T2084: swh-graph: add /last endpoint variants to the REST API: D2289: add /last sub-endpoint to only return destination in walks.
Nov 16 2019, 3:14 PM · Compressed graph service

Nov 13 2019

seirl closed T2055: swh-graph CI hangs badly when py4j doesn't find needed files as Resolved.

Fixed in https://forge.softwareheritage.org/rDGRPHcd135e6607350710ec5b3403b19a92c1d5a28cf5 and https://forge.softwareheritage.org/rDGRPH164bf7b1464ea3b9eb38c91c2a7caee7d6b149f7

Nov 13 2019, 5:54 PM · Continuous Integration, Compressed graph service
zack triaged T2084: swh-graph: add /last endpoint variants to the REST API as Normal priority.
Nov 13 2019, 5:12 PM · Compressed graph service
zack renamed T1968: existing graph endpoints should not return 404 upon missing arguments from existing graph endpoints should not return 404 upon for missing arguments to existing graph endpoints should not return 404 upon missing arguments.
Nov 13 2019, 3:32 PM · Easy hack, Compressed graph service
zack triaged T2083: provide systemd service file for swh-graph as Low priority.
Nov 13 2019, 3:32 PM · Compressed graph service

Nov 12 2019

zack assigned T2055: swh-graph CI hangs badly when py4j doesn't find needed files to seirl.

Another simple way to reproduce is just removing the *.jar file and running pytest on test_api_client.py.
This is not even a Java exception, but chances are fixing that case will fix at least a significant part of the general problem, if not all.

Nov 12 2019, 4:16 PM · Continuous Integration, Compressed graph service
zack triaged T2081: swh-graph: "Cannot open client FIFO" when answering HEAD requests as Low priority.
Nov 12 2019, 2:58 PM · Compressed graph service

Nov 11 2019

zack added a revision to T2077: add random walk endpoint with limited retries: D2249: swh-graph: add random walk endpoint.
Nov 11 2019, 7:05 PM · Compressed graph service
zack added a comment to T2055: swh-graph CI hangs badly when py4j doesn't find needed files.

AFAICT this is a more general problem, the Java backend can hang forever in case of unexpected situations (uncaught exceptions? I really don't know…), which will make it not respond to any incoming request with no visible output.
We should make this visible and debuggable.

Nov 11 2019, 1:45 PM · Continuous Integration, Compressed graph service
zack changed the status of T2077: add random walk endpoint with limited retries from Open to Work in Progress.

initial skeleton (not yet working) in rDGRPHbc368c1775e6, branch feature/random-walk

Nov 11 2019, 12:45 PM · Compressed graph service
zack triaged T2077: add random walk endpoint with limited retries as Normal priority.
Nov 11 2019, 12:42 PM · Compressed graph service
zack renamed T1969: graph: reduce RAM usage for /walk from Reduce RAM usage for graph backtracking to reduce RAM usage for /walk.
Nov 11 2019, 12:39 PM · Compressed graph service

Nov 9 2019

zack closed T2056: fix swh-graph sphinx table of content as Invalid.

This was actually a false alarm. Due to the lack of --separate in the build toolchain for the entire docs.s.o, submodules (as opposed to sub-*packages*) of swh.graph were not visible in the TOC and only visible by scrolling down the page. I've fixed this with 6547df80508fa8d467475a8ca8db307ceb2f9972 in swh-docs.

Nov 9 2019, 6:28 PM · Documentation, Compressed graph service

Nov 8 2019

zack triaged T2072: common configuration file for swh graph rpc-serve, compress, … as Normal priority.
Nov 8 2019, 3:03 PM · Compressed graph service
zack closed T1937: nicer landing page for the swh-graph REST API as Resolved.

this is now done in the aiohttp server, which says (with links):

Nov 8 2019, 2:25 PM · Compressed graph service
vlorentz claimed T2053: support graph export for the cassandra backend.
Nov 8 2019, 11:52 AM · Compressed graph service, Storage manager

Nov 7 2019

zack closed T1944: use a compact, binary format for node ids mapping files as Resolved.

Closed in rDGRPH998a44353612

Nov 7 2019, 11:36 PM · Compressed graph service
zack closed T1944: use a compact, binary format for node ids mapping files, a subtask of T1950: Reduce RAM usage for generating mapping files, as Resolved.
Nov 7 2019, 11:36 PM · Compressed graph service
zack closed T1950: Reduce RAM usage for generating mapping files as Resolved.

Closed in 6d2f04b4d5a4

Nov 7 2019, 11:35 PM · Compressed graph service
vlorentz added a comment to T2053: support graph export for the cassandra backend.

Probably not. I'm working on adding support for other objects.

Nov 7 2019, 5:24 PM · Compressed graph service, Storage manager
zack added a comment to T2053: support graph export for the cassandra backend.

Added parallelism. 450k/s with 16 workers and no compression. I won't try with 32 workers because Python processes would use too much CPU on my machine.

Nov 7 2019, 5:18 PM · Compressed graph service, Storage manager
vlorentz added a comment to T2053: support graph export for the cassandra backend.

Added parallelism. 450k/s with 16 workers and no compression. I won't try with 32 workers because Python processes would use too much CPU on my machine.

Nov 7 2019, 4:47 PM · Compressed graph service, Storage manager
vlorentz added a comment to T2053: support graph export for the cassandra backend.

Throughput improved to 34k/s just by not querying unneeded fields.

Nov 7 2019, 3:29 PM · Compressed graph service, Storage manager

Nov 5 2019

zack reopened T1944: use a compact, binary format for node ids mapping files as "Open".

reopen, as it's not fixed in master yet

Nov 5 2019, 4:18 PM · Compressed graph service
zack reopened T1944: use a compact, binary format for node ids mapping files, a subtask of T1950: Reduce RAM usage for generating mapping files, as Open.
Nov 5 2019, 4:18 PM · Compressed graph service
zack reopened T1950: Reduce RAM usage for generating mapping files as "Open".

reopen, as it's not closed in master yet

Nov 5 2019, 4:18 PM · Compressed graph service
zack closed T1950: Reduce RAM usage for generating mapping files as Resolved by committing rDGRPH6d2f04b4d5a4: Setup.java: shell out node2pid map generation to sort.
Nov 5 2019, 3:46 PM · Compressed graph service
zack added a comment to T2053: support graph export for the cassandra backend.

Looks good, thanks !

Nov 5 2019, 2:05 PM · Compressed graph service, Storage manager
zack updated the task description for T2053: support graph export for the cassandra backend.
Nov 5 2019, 2:00 PM · Compressed graph service, Storage manager
zack triaged T2056: fix swh-graph sphinx table of content as Low priority.
Nov 5 2019, 10:24 AM · Documentation, Compressed graph service

Nov 4 2019

seirl closed T1884: python bindings for compressed graph access as Resolved.
Nov 4 2019, 2:58 PM · Compressed graph service
vlorentz added a comment to T2053: support graph export for the cassandra backend.

I wrote a prototype for exporting revisions: https://forge.softwareheritage.org/source/snippets/browse/master/vlorentz/cassandra_stream_graph.py

Nov 4 2019, 1:53 PM · Compressed graph service, Storage manager
zack renamed T2055: swh-graph CI hangs badly when py4j doesn't find needed files from swh-graph CI hangs badly on test_api_client.py to swh-graph CI hangs badly when py4j doesn't find needed files.
Nov 4 2019, 1:45 PM · Continuous Integration, Compressed graph service
zack closed T1944: use a compact, binary format for node ids mapping files as Resolved by committing rDGRPH7c40a7d2b722: switch Java map generation from CSV to binary format.
Nov 4 2019, 11:47 AM · Compressed graph service
zack closed T1944: use a compact, binary format for node ids mapping files, a subtask of T1950: Reduce RAM usage for generating mapping files, as Resolved.
Nov 4 2019, 11:47 AM · Compressed graph service
olasd added a comment to T2055: swh-graph CI hangs badly when py4j doesn't find needed files.

The .jar file is never installed within the tox environment, so the graph backend process fixture never actually succeeds in launching the server. FWIW, when running tox on my system, the tests hang just the same.

Nov 4 2019, 11:43 AM · Continuous Integration, Compressed graph service

Nov 3 2019

zack triaged T2055: swh-graph CI hangs badly when py4j doesn't find needed files as Unbreak Now! priority.
Nov 3 2019, 4:46 PM · Continuous Integration, Compressed graph service
zack closed T2054: CI: ImportMismatchError when running on swh-graph as Resolved by committing rDGRPH677daca371fe: tox.ini: fix pytest ImportMismatchError.
Nov 3 2019, 4:25 PM · Compressed graph service, Continuous Integration
zack updated the task description for T2054: CI: ImportMismatchError when running on swh-graph.
Nov 3 2019, 4:12 PM · Compressed graph service, Continuous Integration
zack updated the task description for T2054: CI: ImportMismatchError when running on swh-graph.
Nov 3 2019, 4:07 PM · Compressed graph service, Continuous Integration
zack triaged T2054: CI: ImportMismatchError when running on swh-graph as High priority.
Nov 3 2019, 4:05 PM · Compressed graph service, Continuous Integration
zack closed T1941: Automatically generate mapping files after compressing graph as Resolved by committing rDGRPH545be725d34b: webgraph.py: autoatically generate mappings at the end of compression.
Nov 3 2019, 3:46 PM · Compressed graph service
zack raised the priority of T1941: Automatically generate mapping files after compressing graph from Normal to High.
Nov 3 2019, 3:12 PM · Compressed graph service
zack raised the priority of T1944: use a compact, binary format for node ids mapping files from Normal to High.
Nov 3 2019, 3:11 PM · Compressed graph service
zack raised the priority of T1950: Reduce RAM usage for generating mapping files from Normal to High.
Nov 3 2019, 3:11 PM · Compressed graph service
zack closed T1930: swh-graph: ship swh-graph.jar in the docker container as Wontfix.

now that we have the swh graph compress CLI, we're moving away to using docker for automated compression, so this has become moot

Nov 3 2019, 3:10 PM · Compressed graph service

Oct 31 2019

zack triaged T2053: support graph export for the cassandra backend as Normal priority.
Oct 31 2019, 2:09 PM · Compressed graph service, Storage manager

Sep 13 2019

zack updated subscribers of T1950: Reduce RAM usage for generating mapping files.

Neither of the two spectrum endpoints "fully sort in RAM then write sequentially" and "write randomly" is satisfactory here.
What we want is: in memory sorting within the limits allowed by available RAM + swapon/swapoff of partially sorted subsets + sequential write at the end.
We can implement this in Java in the Setup class, but, in fact, that is exactly what /usr/bin/sort is good at doing. So I propose to shell out to it from Setup and serialize sort result to a writer for the binary format of T1944.

Sep 13 2019, 1:22 PM · Compressed graph service
zack changed the status of T1944: use a compact, binary format for node ids mapping files, a subtask of T1950: Reduce RAM usage for generating mapping files, from Open to Work in Progress.
Sep 13 2019, 1:19 PM · Compressed graph service
zack changed the status of T1944: use a compact, binary format for node ids mapping files from Open to Work in Progress.

Status update: we have now binary serialization formats for the two maps, see docstrings of PidToIntMap and IntToPidMap in swh.graph.pid
That means that Python code can read the compact maps (and also write them, but at a speed that is impractical for generation). Conversion of the textual maps generated for the most recent compressed graph is ongoing and almost completed.

Sep 13 2019, 1:19 PM · Compressed graph service

Aug 27 2019

haltode created P516 Updated README for compressed graph on annex in the S1 Public space.
Aug 27 2019, 9:20 PM · Compressed graph service
haltode created P514 haltode swh-graph commits list in the S1 Public space.
Aug 27 2019, 2:02 PM · Compressed graph service

Aug 26 2019

zack added a project to T1971: Integrate swh-graph javadoc in swh-docs: Documentation.
Aug 26 2019, 10:45 PM · Documentation, Compressed graph service
haltode triaged T1971: Integrate swh-graph javadoc in swh-docs as Low priority.
Aug 26 2019, 10:25 PM · Documentation, Compressed graph service
zack closed T1943: Publish swh-graph to PyPI as Resolved.
Aug 26 2019, 10:29 AM · Compressed graph service
zack closed T1887: publish swh-graph documentation at docs.s.o as Resolved.

now at https://docs.softwareheritage.org/devel/swh-graph/

Aug 26 2019, 10:28 AM · Documentation, Compressed graph service
zack closed T1904: build developer documentation for swh-graph as Resolved.
Aug 26 2019, 10:28 AM · Documentation, Compressed graph service
zack closed T1904: build developer documentation for swh-graph, a subtask of T1887: publish swh-graph documentation at docs.s.o, as Resolved.
Aug 26 2019, 10:28 AM · Documentation, Compressed graph service

Aug 25 2019

haltode closed T1851: Integrate graph-compression git repo in swh-environment as Resolved.

The swh-graph repo is now fully integrated and has CI

Aug 25 2019, 2:56 PM · Compressed graph service
haltode closed T1851: Integrate graph-compression git repo in swh-environment, a subtask of T1887: publish swh-graph documentation at docs.s.o, as Resolved.
Aug 25 2019, 2:56 PM · Documentation, Compressed graph service
haltode placed T1904: build developer documentation for swh-graph up for grabs.
Aug 25 2019, 2:55 PM · Documentation, Compressed graph service
haltode placed T1941: Automatically generate mapping files after compressing graph up for grabs.
Aug 25 2019, 2:55 PM · Compressed graph service
haltode closed T1951: Reduce RAM usage in graph API endpoints as Resolved.

This was fixed with in 87192dfddd4b by using a hash map. See T1969 for long term solution.

Aug 25 2019, 2:54 PM · Compressed graph service
haltode triaged T1969: graph: reduce RAM usage for /walk as Normal priority.
Aug 25 2019, 2:54 PM · Compressed graph service
zack triaged T1968: existing graph endpoints should not return 404 upon missing arguments as Low priority.
Aug 25 2019, 2:50 PM · Easy hack, Compressed graph service
haltode closed T1885: benchmark swh-graph use cases on the full graph as Resolved.

Done, see latex report in https://forge.softwareheritage.org/source/swh-graph/browse/master/reports/benchmarks/benchmarks.tex

Aug 25 2019, 2:40 PM · Compressed graph service
haltode updated subscribers of T1943: Publish swh-graph to PyPI.

However in swh-docs the java version is 8 which is not compatible to generate the javadoc (it should be >= 9), see https://jenkins.softwareheritage.org/view/all/job/DDOC/job/publish/lastFailedBuild/console

Aug 25 2019, 2:36 PM · Compressed graph service

Aug 24 2019

haltode closed T1967: REST server hangs when loading entire graph as Resolved.

I recompiled from scratch the Java server, rebooted the Azure vm and everything works as expected, no more query hanging.

Aug 24 2019, 8:35 PM · Compressed graph service

Aug 23 2019

zack raised the priority of T1967: REST server hangs when loading entire graph from High to Unbreak Now!.
Aug 23 2019, 8:13 PM · Compressed graph service
haltode added a project to T1967: REST server hangs when loading entire graph: Compressed graph service.
Aug 23 2019, 8:11 PM · Compressed graph service

Aug 16 2019

haltode closed T1952: Log raw datapoint in graph benchmarks as Resolved by committing rDGRPH30218b7dc68f: server: benchmark: output raw datapoints in CSV log file.
Aug 16 2019, 6:12 PM · Compressed graph service
haltode added a revision to T1952: Log raw datapoint in graph benchmarks: D1855: Benchmark: output raw datapoints in CSV log file.
Aug 16 2019, 12:14 PM · Compressed graph service

Aug 15 2019

haltode triaged T1952: Log raw datapoint in graph benchmarks as Normal priority.
Aug 15 2019, 9:49 PM · Compressed graph service
haltode created P507 Graph benchmarks: vault use-case in the S1 Public space.
Aug 15 2019, 9:46 PM · Compressed graph service
haltode added a project to P505 Browsing benchmarks early results: Compressed graph service.
Aug 15 2019, 9:46 PM · Compressed graph service

Aug 14 2019

haltode added a comment to T1951: Reduce RAM usage in graph API endpoints.

From Javalin documentation [1]:

Aug 14 2019, 9:52 PM · Compressed graph service
haltode added a comment to T1951: Reduce RAM usage in graph API endpoints.

Both big arrays are meant to be used with all the graph nodes, here is their RAM usage:

Aug 14 2019, 9:47 PM · Compressed graph service
haltode updated the task description for T1951: Reduce RAM usage in graph API endpoints.
Aug 14 2019, 9:47 PM · Compressed graph service
haltode updated subscribers of T1951: Reduce RAM usage in graph API endpoints.
Aug 14 2019, 9:33 PM · Compressed graph service
haltode triaged T1951: Reduce RAM usage in graph API endpoints as High priority.
Aug 14 2019, 9:26 PM · Compressed graph service

Aug 10 2019

zack renamed T1950: Reduce RAM usage for generating mapping files from Implement mapping files dumping with less RAM usage to Reduce RAM usage for generating mapping files.
Aug 10 2019, 3:45 PM · Compressed graph service
haltode added a subtask for T1950: Reduce RAM usage for generating mapping files: T1944: use a compact, binary format for node ids mapping files.
Aug 10 2019, 9:23 AM · Compressed graph service
haltode added a parent task for T1944: use a compact, binary format for node ids mapping files: T1950: Reduce RAM usage for generating mapping files.
Aug 10 2019, 9:23 AM · Compressed graph service
haltode triaged T1950: Reduce RAM usage for generating mapping files as Normal priority.
Aug 10 2019, 9:22 AM · Compressed graph service

Aug 9 2019

haltode closed T1945: Return timings instead of simply logging them as Resolved by committing rDGRPH7e1917a236f3: server: add endpoints wrapper class to return metadata.
Aug 9 2019, 4:12 PM · Compressed graph service

Aug 8 2019

haltode added a revision to T1945: Return timings instead of simply logging them: D1832: Endpoints now return timings instead of logging them.
Aug 8 2019, 3:53 PM · Compressed graph service
zack renamed T1944: use a compact, binary format for node ids mapping files from More compact format for node ids mapping files to use a compact, binary format for node ids mapping files.
Aug 8 2019, 1:00 PM · Compressed graph service
haltode triaged T1945: Return timings instead of simply logging them as Normal priority.
Aug 8 2019, 10:34 AM · Compressed graph service
haltode triaged T1944: use a compact, binary format for node ids mapping files as Normal priority.
Aug 8 2019, 10:29 AM · Compressed graph service

Aug 5 2019

haltode closed T1877: Add contextual info to compression pipeline as Resolved by committing rDGRPH403f1e010c3e: dockerfiles: add contextual info to compression script.
Aug 5 2019, 3:37 PM · Compressed graph service
haltode added a revision to T1877: Add contextual info to compression pipeline: D1817: Add contextual info to compression script.
Aug 5 2019, 2:37 PM · Compressed graph service