Page MenuHomeSoftware Heritage

Add browsing use-cases benchmarks
ClosedPublic

Authored by haltode on Aug 12 2019, 4:43 PM.

Diff Detail

Repository
rDGRPH Graph service
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

haltode created this revision.Aug 12 2019, 4:43 PM
zack requested changes to this revision.Aug 13 2019, 8:28 AM
zack added inline comments.
docs/api.rst
39

A bit awkward/unclear. Simpler: "avg time needed to traverse an edge", or something such.

java/server/src/main/java/org/softwareheritage/graph/Endpoint.java
60

It's not great to return digested data at this inner level. It'd be best to return raw info here, like the total traversal time (that you already have). And the number of traversed edges. Then, do the avg computation later, when you have to output/use it.

java/server/src/main/java/org/softwareheritage/graph/benchmark/Common.java
43

I haven't checked the code as I'm on my phone, but Statistics sounds like the right place where you want to compute the avg. You should just make sure the raw data needed to do so reach it.

This revision now requires changes to proceed.Aug 13 2019, 8:28 AM
haltode added inline comments.Aug 13 2019, 10:41 AM
java/server/src/main/java/org/softwareheritage/graph/Endpoint.java
60

The number of traversed edges is already returned implicitly as the size of the result field.

java/server/src/main/java/org/softwareheritage/graph/benchmark/Common.java
43

Right now the Statistics class is a general class to compute stats on double values because AccessEdge does not handle Endpoint.Output but raw timings. I'll try to think about a way to have a more specific endpoint related statistics class/methods.

zack added inline comments.Aug 13 2019, 10:50 AM
java/server/src/main/java/org/softwareheritage/graph/Endpoint.java
60

Yes and no, for instance, I suspect there is an off-by-one, due to the difference in returning the number of *nodes* and the number of *edges*. Also, for endpoints that only return the final node (e.g., for the provenance use cases), you will not have anything meaningful on which to call size(). Hence my suggestion to return as some sort of timings (it's not time, but it's still a measure of how much work the endpoint calculation did) the metric of how many edges you've traversed. That can be consumed by downstream users to compute and output average cost.

haltode added inline comments.Aug 13 2019, 11:33 AM
java/server/src/main/java/org/softwareheritage/graph/Endpoint.java
60

Oh I see, so yes I need to add a new meta info indeed. However I'm not sure what would be the proper way to do so.

I don't want to have yet another wrapper class to return both the result and meta information, especially since meta information are all calculated in Endpoint, and this class was only created to be a wrapper class to do all the higher-level computation around low-level traversal methods.

haltode updated this revision to Diff 6210.Aug 13 2019, 2:24 PM
  • Add a new meta info nbEdgesAccessed
  • Move utils/ into benchmark/utils/

I ended up using a member variable to count number of edges accessed in the
lower-level class, I find this better than a wrapper class but I'm still open to
other suggestions (or arguments for wrapper class!). ;)

zack accepted this revision.Aug 13 2019, 2:48 PM
This revision is now accepted and ready to land.Aug 13 2019, 2:48 PM
This revision was landed with ongoing or failed builds.Aug 13 2019, 3:05 PM
This revision was automatically updated to reflect the committed changes.