Page MenuHomeSoftware Heritage

Add browsing use-cases benchmarks
ClosedPublic

Authored by haltode on Aug 12 2019, 4:43 PM.

Diff Detail

Repository
rDGRPH Compressed graph representation
Branch
benchmark-browsing
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 7240
Build 10231: tox-on-jenkinsJenkins
Build 10230: arc lint + arc unit

Event Timeline

zack requested changes to this revision.Aug 13 2019, 8:28 AM
zack added inline comments.
docs/api.rst
38

A bit awkward/unclear. Simpler: "avg time needed to traverse an edge", or something such.

java/server/src/main/java/org/softwareheritage/graph/Endpoint.java
59

It's not great to return digested data at this inner level. It'd be best to return raw info here, like the total traversal time (that you already have). And the number of traversed edges. Then, do the avg computation later, when you have to output/use it.

java/server/src/main/java/org/softwareheritage/graph/benchmark/Common.java
42

I haven't checked the code as I'm on my phone, but Statistics sounds like the right place where you want to compute the avg. You should just make sure the raw data needed to do so reach it.

This revision now requires changes to proceed.Aug 13 2019, 8:28 AM
java/server/src/main/java/org/softwareheritage/graph/Endpoint.java
59

The number of traversed edges is already returned implicitly as the size of the result field.

java/server/src/main/java/org/softwareheritage/graph/benchmark/Common.java
42

Right now the Statistics class is a general class to compute stats on double values because AccessEdge does not handle Endpoint.Output but raw timings. I'll try to think about a way to have a more specific endpoint related statistics class/methods.

java/server/src/main/java/org/softwareheritage/graph/Endpoint.java
59

Yes and no, for instance, I suspect there is an off-by-one, due to the difference in returning the number of *nodes* and the number of *edges*. Also, for endpoints that only return the final node (e.g., for the provenance use cases), you will not have anything meaningful on which to call size(). Hence my suggestion to return as some sort of timings (it's not time, but it's still a measure of how much work the endpoint calculation did) the metric of how many edges you've traversed. That can be consumed by downstream users to compute and output average cost.

java/server/src/main/java/org/softwareheritage/graph/Endpoint.java
59

Oh I see, so yes I need to add a new meta info indeed. However I'm not sure what would be the proper way to do so.

I don't want to have yet another wrapper class to return both the result and meta information, especially since meta information are all calculated in Endpoint, and this class was only created to be a wrapper class to do all the higher-level computation around low-level traversal methods.

  • Add a new meta info nbEdgesAccessed
  • Move utils/ into benchmark/utils/

I ended up using a member variable to count number of edges accessed in the
lower-level class, I find this better than a wrapper class but I'm still open to
other suggestions (or arguments for wrapper class!). ;)

This revision is now accepted and ready to land.Aug 13 2019, 2:48 PM
This revision was landed with ongoing or failed builds.Aug 13 2019, 3:05 PM
This revision was automatically updated to reflect the committed changes.