Page MenuHomeSoftware Heritage

api/graph: Implement anti-DoS policies for graph visits
ClosedPublic

Authored by anlambert on Jan 11 2022, 2:32 PM.

Details

Summary

In order to allow public access to the graph API, we need to prevent
users to DoS the graph service with resource intensive queries.

Fortunately, the graph API supports a max_edges parameter to limit the
number of edges traversed by the graph.

According to the type of user querying the graph service through the
Web API proxy, a maximum value for the max_edges will be set before
sending a request to the graph API.

Three type of users are considered: anonymous, standard and staff.
The maximum values for the max_edges query parameter according to
user types are retrieved from the webapp configuration.

Related to T3836

Depends on D6919

Diff Detail

Repository
rDWAPPS Web applications
Branch
api-graph-anti-dos-policies
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 25966
Build 40582: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 40581: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D6914 (id=25057)

Rebasing onto e0a181daf4...

Current branch diff-target is up to date.
Changes applied before test
commit 86675f4b3c3349ca4671291fdcd038a6d5734fec
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 14:25:08 2022 +0100

    api/graph: Implement anti-DoS policies for graph visits
    
    In order to allow public access to the graph API, we need to prevent
    users to DoS the graph service with resource intensive queries.
    
    Fortunately, the graph API supports a max_edges parameter to limit the
    number of edges traversed by the graph.
    
    According to the type of user querying the graph service through the
    Web API proxy, a maximum value for the max_edges will be set before
    sending a request to the graph API.
    
    Three type of users are considered: anonymous, standard and staff.
    The maximum values for the max_edges query parameter according to
    user types are retrieved from the webapp configuration.
    
    Related to T3836

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1284/ for more details.

seirl requested changes to this revision.Jan 11 2022, 3:15 PM
seirl added a subscriber: seirl.

I have just one security consideration, otherwise LGTM. Thanks!

swh/web/api/views/graph.py
158

I think this is potentially dangerous if an endpoint gets passed with a query parameter.

If someone passes something like graph_query == "/graph/visit/nodes?prod=true" to the function, the resulting URL will look like https://graph.internal.softwareheritage.org/graph/visit/nodes?prod=true?max_edges=42 and the max_edges parameter will get silently ignored.

Instead, we could use something from urllib like this: https://stackoverflow.com/a/2506477 to ensure that the argument is properly added to the query (with a & if there is already a ?, or with a ? otherwise).

This revision now requires changes to proceed.Jan 11 2022, 3:15 PM
swh/web/api/views/graph.py
158

This should not happen as the Web API URL will be parsed by django so graph_query will not contain any ? character and the query parameters will be available in request.GET.

Nevertheless, we can have such issue if the ? got quoted to %3F in the Web API URL. I will handle that case in another diff.

I also found out that I did not handle really well the error responses coming from the graph API, I will also handle that in a new diff.

Nevertheless, we can have such issue if the ? got quoted to %3F in the Web API URL. I will handle that case in another diff.

D6919

I also found out that I did not handle really well the error responses coming from the graph API, I will also handle that in a new diff.

D6918

I will rebase that diff on top of those once they are landed.

Build was aborted

Patch application report for D6914 (id=25072)

Could not rebase; Attempt merge onto 15e6e1988a...

Updating 15e6e198..a0d3ab36
Fast-forward
 swh/web/api/views/graph.py            |  32 +++++++--
 swh/web/config.py                     |   5 +-
 swh/web/tests/api/views/test_graph.py | 130 +++++++++++++++++++++++++++++++++-
 3 files changed, 160 insertions(+), 7 deletions(-)
Changes applied before test
commit a0d3ab367464a55e946a4adaff5e3de388eb6011
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 14:25:08 2022 +0100

    api/graph: Implement anti-DoS policies for graph visits
    
    In order to allow public access to the graph API, we need to prevent
    users to DoS the graph service with resource intensive queries.
    
    Fortunately, the graph API supports a max_edges parameter to limit the
    number of edges traversed by the graph.
    
    According to the type of user querying the graph service through the
    Web API proxy, a maximum value for the max_edges will be set before
    sending a request to the graph API.
    
    Three type of users are considered: anonymous, standard and staff.
    The maximum values for the max_edges query parameter according to
    user types are retrieved from the webapp configuration.
    
    Related to T3836

commit 43525e69eaeb058b5d41c7031c422e7de088af16
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 17:40:34 2022 +0100

    api/graph: Handle query parameters that might be passed in graph_query
    
    It is possible to pass query parameters in the graph_query URL argument
    of the Web API graph endpoint if the ? character is quoted.
    
    So add extra processing in the endpoint implementation to merge the
    query parameters extracted from the graph_query value and those
    coming from the django request.
    
    The purpose is to avoid some query parameters to be silently ignored
    when requesting the graph API.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1290/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1290/console

Build has FAILED

Patch application report for D6914 (id=25072)

Could not rebase; Attempt merge onto 15e6e1988a...

Updating 15e6e198..a0d3ab36
Fast-forward
 swh/web/api/views/graph.py            |  32 +++++++--
 swh/web/config.py                     |   5 +-
 swh/web/tests/api/views/test_graph.py | 130 +++++++++++++++++++++++++++++++++-
 3 files changed, 160 insertions(+), 7 deletions(-)
Changes applied before test
commit a0d3ab367464a55e946a4adaff5e3de388eb6011
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 14:25:08 2022 +0100

    api/graph: Implement anti-DoS policies for graph visits
    
    In order to allow public access to the graph API, we need to prevent
    users to DoS the graph service with resource intensive queries.
    
    Fortunately, the graph API supports a max_edges parameter to limit the
    number of edges traversed by the graph.
    
    According to the type of user querying the graph service through the
    Web API proxy, a maximum value for the max_edges will be set before
    sending a request to the graph API.
    
    Three type of users are considered: anonymous, standard and staff.
    The maximum values for the max_edges query parameter according to
    user types are retrieved from the webapp configuration.
    
    Related to T3836

commit 43525e69eaeb058b5d41c7031c422e7de088af16
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 17:40:34 2022 +0100

    api/graph: Handle query parameters that might be passed in graph_query
    
    It is possible to pass query parameters in the graph_query URL argument
    of the Web API graph endpoint if the ? character is quoted.
    
    So add extra processing in the endpoint implementation to merge the
    query parameters extracted from the graph_query value and those
    coming from the django request.
    
    The purpose is to avoid some query parameters to be silently ignored
    when requesting the graph API.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1291/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1291/console

Build is green

Patch application report for D6914 (id=25072)

Could not rebase; Attempt merge onto 15e6e1988a...

Updating 15e6e198..a0d3ab36
Fast-forward
 swh/web/api/views/graph.py            |  32 +++++++--
 swh/web/config.py                     |   5 +-
 swh/web/tests/api/views/test_graph.py | 130 +++++++++++++++++++++++++++++++++-
 3 files changed, 160 insertions(+), 7 deletions(-)
Changes applied before test
commit a0d3ab367464a55e946a4adaff5e3de388eb6011
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 14:25:08 2022 +0100

    api/graph: Implement anti-DoS policies for graph visits
    
    In order to allow public access to the graph API, we need to prevent
    users to DoS the graph service with resource intensive queries.
    
    Fortunately, the graph API supports a max_edges parameter to limit the
    number of edges traversed by the graph.
    
    According to the type of user querying the graph service through the
    Web API proxy, a maximum value for the max_edges will be set before
    sending a request to the graph API.
    
    Three type of users are considered: anonymous, standard and staff.
    The maximum values for the max_edges query parameter according to
    user types are retrieved from the webapp configuration.
    
    Related to T3836

commit 43525e69eaeb058b5d41c7031c422e7de088af16
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 17:40:34 2022 +0100

    api/graph: Handle query parameters that might be passed in graph_query
    
    It is possible to pass query parameters in the graph_query URL argument
    of the Web API graph endpoint if the ? character is quoted.
    
    So add extra processing in the endpoint implementation to merge the
    query parameters extracted from the graph_query value and those
    coming from the django request.
    
    The purpose is to avoid some query parameters to be silently ignored
    when requesting the graph API.

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1292/ for more details.

This revision is now accepted and ready to land.Jan 12 2022, 2:08 PM

Build is green

Patch application report for D6914 (id=25092)

Could not rebase; Attempt merge onto 15e6e1988a...

Updating 15e6e198..e870ec2b
Fast-forward
 swh/web/api/views/graph.py            |  32 +++++++--
 swh/web/config.py                     |   5 +-
 swh/web/tests/api/views/test_graph.py | 130 +++++++++++++++++++++++++++++++++-
 3 files changed, 160 insertions(+), 7 deletions(-)
Changes applied before test
commit e870ec2b9a40f9f61d2e50407b1795473487329a
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 14:25:08 2022 +0100

    api/graph: Implement anti-DoS policies for graph visits
    
    In order to allow public access to the graph API, we need to prevent
    users to DoS the graph service with resource intensive queries.
    
    Fortunately, the graph API supports a max_edges parameter to limit the
    number of edges traversed by the graph.
    
    According to the type of user querying the graph service through the
    Web API proxy, a maximum value for the max_edges will be set before
    sending a request to the graph API.
    
    Three type of users are considered: anonymous, standard and staff.
    The maximum values for the max_edges query parameter according to
    user types are retrieved from the webapp configuration.
    
    Related to T3836

commit 9e7732ccebe33b7f4f37ad92a13998136580b2c9
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Jan 11 17:40:34 2022 +0100

    api/graph: Handle query parameters that might be passed in graph_query
    
    It is possible to pass query parameters in the graph_query URL argument
    of the Web API graph endpoint if the ? character is quoted.
    
    So add extra processing in the endpoint implementation to merge the
    query parameters extracted from the graph_query value and those
    coming from the django request.
    
    The purpose is to avoid some query parameters to be silently ignored
    when requesting the graph API.

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1295/ for more details.