Page MenuHomeSoftware Heritage

graph service: allow loading in memory only one direction of the graph
Closed, MigratedEdits Locked

Description

As per title, in order to reduce memory footprint for specific use cases.

This would also imply some decent failure mode for the API, which won't be able to support one or the other direction when it hasn't been loaded.

Event Timeline

zack triaged this task as Low priority.Jan 21 2021, 10:09 AM
zack created this task.

Should this still allow the possibility of browsing both directions?

Say we have a --direction=forward|backward|both flag with forward being the default option.

Nope, if loaded with only one direction, traversals will only be possible in the loaded direction.
This will essentially be a trade-off setting for people who cannot (or doesn't want to) load both direction.
It is fine to fail (gracefully, with an error) traversals requested in a direction that corresponds to a non-loaded graph.

The default loading behavior should either be "both" (overridable "down" to a single direction), or alternatively "whatever is available on disk".

Copying my comment from a linked diff:

The design I have in mind is to split the SwhGraph into two different layers:

  • a SwhGraph interface that will contain the swh-specific metadata handling (node types, SWHIDs etc)
  • a BidirectionalGraph interface on top of ImmutableGraph that will take a graph and its transposed version, and define indegree(), predecessors() and transpose().

Using this, we could:

  • Have a generic bidirectional graph by using a SwhGraph(BidirectionalGraph(graph, graph_transposed))
  • Have a swh-specific unidirectional graph, either forward or backward, by using a SwhGraph(graph) or a SwhGraph(graph_transposed)
  • Have a swh-specific bidirectional graph by using a SwhGraph(BidirectionalGraph(graph, graph_transposed))

Note: I don't know yet what is the correct data model to implement this in Java as I'm not too proficient with interfaces/mixins.

seirl claimed this task.
seirl moved this task from In progress to Implemented on the Compressed graph service board.