Page MenuHomeSoftware Heritage

Add browsing use-cases benchmarks

Authored by haltode on Aug 12 2019, 4:43 PM.

Diff Detail

rDGRPH Graph service
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

haltode created this revision.Aug 12 2019, 4:43 PM
zack requested changes to this revision.Aug 13 2019, 8:28 AM
zack added inline comments.

A bit awkward/unclear. Simpler: "avg time needed to traverse an edge", or something such.


It's not great to return digested data at this inner level. It'd be best to return raw info here, like the total traversal time (that you already have). And the number of traversed edges. Then, do the avg computation later, when you have to output/use it.


I haven't checked the code as I'm on my phone, but Statistics sounds like the right place where you want to compute the avg. You should just make sure the raw data needed to do so reach it.

This revision now requires changes to proceed.Aug 13 2019, 8:28 AM
haltode added inline comments.Aug 13 2019, 10:41 AM

The number of traversed edges is already returned implicitly as the size of the result field.


Right now the Statistics class is a general class to compute stats on double values because AccessEdge does not handle Endpoint.Output but raw timings. I'll try to think about a way to have a more specific endpoint related statistics class/methods.

zack added inline comments.Aug 13 2019, 10:50 AM

Yes and no, for instance, I suspect there is an off-by-one, due to the difference in returning the number of *nodes* and the number of *edges*. Also, for endpoints that only return the final node (e.g., for the provenance use cases), you will not have anything meaningful on which to call size(). Hence my suggestion to return as some sort of timings (it's not time, but it's still a measure of how much work the endpoint calculation did) the metric of how many edges you've traversed. That can be consumed by downstream users to compute and output average cost.

haltode added inline comments.Aug 13 2019, 11:33 AM

Oh I see, so yes I need to add a new meta info indeed. However I'm not sure what would be the proper way to do so.

I don't want to have yet another wrapper class to return both the result and meta information, especially since meta information are all calculated in Endpoint, and this class was only created to be a wrapper class to do all the higher-level computation around low-level traversal methods.

haltode updated this revision to Diff 6210.Aug 13 2019, 2:24 PM
  • Add a new meta info nbEdgesAccessed
  • Move utils/ into benchmark/utils/

I ended up using a member variable to count number of edges accessed in the
lower-level class, I find this better than a wrapper class but I'm still open to
other suggestions (or arguments for wrapper class!). ;)

zack accepted this revision.Aug 13 2019, 2:48 PM
This revision is now accepted and ready to land.Aug 13 2019, 2:48 PM
This revision was landed with ongoing or failed builds.Aug 13 2019, 3:05 PM
This revision was automatically updated to reflect the committed changes.