Page MenuHomeSoftware Heritage

graph service: add anti-DoS limit on the number of edges traversed
Open, NormalPublic

Description

It is currently fairly easy to DoS the graph service, e.g., by performing a visit on the transposed graph starting from the empty file.

We should add safeguards against this. We already have accounting for the number of edges traversed during visit, so the easiest first step would be adding a maximum number of edges traversed during a visit. It should be a configuration option for the service, not overridable a runtime.

Event Timeline

zack triaged this task as Normal priority.Mar 22 2021, 9:12 AM
zack created this task.

To be consistent with swh-storage, swh-graph should take a limit as query parameter, but not have a hardcoded upper bound for that limit. Instead, swh-web provides that upper bound.

zack renamed this task from graph service: add limit on the number of edges traversed to graph service: add anti-DoS limit on the number of edges traversed.Mar 22 2021, 9:43 AM
zack added a parent task: T2220: swh-graph in production.

I have two questions to make sure I understand well :

1/ If I wanted to reproduce a Ddos, I would have to do it on the whole dataset ? the py3k dataset is not big enough ?
2/ What we want is an argument on the visit requests of the REST API to limit the number of visited edges ? Or an argument at another level?

Thanks.

  1. I don't think you need to reproduce a DoS, just make sure we don't return more results than we should
  2. yes, a query parameter would be good

To complement what @vlorentz mentioned, we should actually stop the visit after the maximum number of edges has been reached, because it is keep doing the visit (no matter how many results are returned after it) that can DoS the swh-graph backend.

@vlorentz about the query parameters: should there be a default value for it or not? (We want this to be consistent with swh-storage, as you mentioned, but it's not clear to me if there's a default value in that case or not.)

should there be a default value for it or not? (We want this to be consistent with swh-storage

swh-storage is not consistent with itself, so either way is fine

ok, so @Hakimb: go for no default value. If the query param is not passed, the visit will not stop before the end. If it's given, it will stop once the limit is reached. Call the query param ?max_edges. You will find that the java code already keeps track of the number of edges traversed, so you should just need to compare with that.