Page MenuHomeSoftware Heritage

Add a static query cost calculator to reject malicious quries
Needs ReviewPublic

Authored by jayeshv on Jul 4 2022, 3:34 PM.

Details

Reviewers
None
Group Reviewers
Reviewers
Maniphest Tasks
T4299: GraphQL: Static query validation and max cost limiting
Summary

Calculate the total query cost from the query AST. This will block big queries
before executing them.

Related to T4299

Diff Detail

Repository
rDGQL GraphQL API
Branch
query-cost-8077
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 32317
Build 50619: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 50618: arc lint + arc unit

Unit TestsFailed

TimeTest
14 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.graphql.tests.functional.test_branch_connection::test_get_after_arg
client = <FlaskClient <Flask 'swh.graphql.tests.conftest'>> def test_get_after_arg(client):
14 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.graphql.tests.functional.test_branch_connection::test_get_branches_with_alias
client = <FlaskClient <Flask 'swh.graphql.tests.conftest'>> def test_get_branches_with_alias(client):
418 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.graphql.tests.functional.test_branch_connection::test_get_data
client = <FlaskClient <Flask 'swh.graphql.tests.conftest'>> def test_get_data(client):
16 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.graphql.tests.functional.test_branch_connection::test_get_first_arg[1]
client = <FlaskClient <Flask 'swh.graphql.tests.conftest'>>, count = 1 @pytest.mark.parametrize("count", [1, 2])
14 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.graphql.tests.functional.test_branch_connection::test_get_first_arg[2]
client = <FlaskClient <Flask 'swh.graphql.tests.conftest'>>, count = 2 @pytest.mark.parametrize("count", [1, 2])
View Full Test Results (169 Failed · 83 Passed)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jayeshv retitled this revision from Add basic rate limiting to Add a basic query cost validator.Jul 5 2022, 6:44 AM
jayeshv edited the summary of this revision. (Show Details)

Build is green

Patch application report for D8077 (id=29161)

Rebasing onto e4d4a30b1f...

Current branch diff-target is up to date.
Changes applied before test
commit f5ded23f97c8e912e2aa5e6dfa48cbcbe636b5f6
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Jul 4 10:41:16 2022 +0200

    Add a basic query cost validator
    
    Use default query validator from ariadne
    Add a cost of 2 for a connections and 1 for a node
    Reject the query from executing if the total cost is
    greater than the max limit (set to 200 at the moment).
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/86/ for more details.

Build is green

Patch application report for D8077 (id=29162)

Rebasing onto e4d4a30b1f...

Current branch diff-target is up to date.
Changes applied before test
commit f69aae62ed1c6d1c1ac6cddf5023b0f981f7e58a
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Jul 4 10:41:16 2022 +0200

    Add a basic query cost validator
    
    Use default query validator from ariadne
    Add a cost of 2 for a connections and 1 for a node
    Reject the query from executing if the total cost is
    greater than the max limit (set to 200 at the moment).
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/87/ for more details.

These @cost directives will be ignored by parsers that don't support it, right?

These @cost directives will be ignored by parsers that don't support it, right?

These @cost directives will be ignored by parsers that don't support it, right?

It is an add on. A parser must ignore any unknown directive and the schema will still be valid.

An issue I have is with calculating cost for connections. As of now, I have hard coded that to 2 for every connection. But I don't really like this.

My initial ideas was to calculate cost by multiplying the value of 'first' (number of nodes requested) with the unit node cost (1)
But this doesn't make sense as a connection query is not 'n' times complex to retrieve (it is a constant in most cases). Do you have some thoughts here?

An issue I have is with calculating cost for connections. As of now, I have hard coded that to 2 for every connection. But I don't really like this.

Why?

My initial ideas was to calculate cost by multiplying the value of 'first' (number of nodes requested) with the unit node cost (1)
But this doesn't make sense as a connection query is not 'n' times complex to retrieve (it is a constant in most cases). Do you have some thoughts here?

Isn't it how GitHub does it, though? (Not that I like it...)

An issue I have is with calculating cost for connections. As of now, I have hard coded that to 2 for every connection. But I don't really like this.

Why?

It is a bit unfair, if you request 5 nodes or 50 nodes, it costs you the same. No incentive for retrieving minimal data.

My initial ideas was to calculate cost by multiplying the value of 'first' (number of nodes requested) with the unit node cost (1)
But this doesn't make sense as a connection query is not 'n' times complex to retrieve (it is a constant in most cases). Do you have some thoughts here?

Isn't it how GitHub does it, though? (Not that I like it...)

Yes, one idea is to have big numbers. say, 1000 for a single node and 100 for one node in connection. It is not perfect either.
Maybe that is a possible future improvement.

An issue I have is with calculating cost for connections. As of now, I have hard coded that to 2 for every connection. But I don't really like this.

Why?

It is a bit unfair, if you request 5 nodes or 50 nodes, it costs you the same. No incentive for retrieving minimal data.

But there is also a per-node cost of 1, so that's the incensentive, right?

But there is also a per-node cost of 1, so that's the incensentive, right?

No, every connection, irrespective of the number of nodes we retrieve will cost 2 units.
Every direct node retrieval (say get an origin using URL) will cost 1 unit.

For every node in a connection to cost 1 unit, we have to use a multiplier (that was my original idea). The issue here is,
reverting n nodes in a connection will cost n units and 2*n will cost double.
Isn't it too bad?. From a backed point of view retrieving n or 2*n nodes will cost almost the same resources.

Where is this "2*n" coming from? Is this by assuming every connection multiplies by a factor of 2?

If so, why have this factor at all?

Oh wait I get it. Doing a connection with n nodes costs c*n (for a constant c you implicitly decided to be 1), then fetching each of these nodes also costs n, so it's c*n+n.

Yeah that makes sense to me.

Oh wait I get it. Doing a connection with n nodes costs c*n (for a constant c you implicitly decided to be 1), then fetching each of these nodes also costs n, so it's c*n+n.

Yeah that makes sense to me.

Yes, I am trying to add some functional tests to make it a bit clearer.

vlorentz retitled this revision from Add a basic query cost validator to [WIP] Add a basic query cost validator.Aug 10 2022, 11:33 AM
jayeshv retitled this revision from [WIP] Add a basic query cost validator to [WIP] Add a static query cost calculator to reject malicious quries.Aug 30 2022, 3:23 PM
jayeshv edited the summary of this revision. (Show Details)

Build has FAILED

Patch application report for D8077 (id=30145)

Rebasing onto 4478a533a9...

Current branch diff-target is up to date.
Changes applied before test
commit 5b7e79d29412c70663f4658f17ad3ee3c1a64af8
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

Link to build: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/137/
See console output for more information: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/137/console

Build is green

Patch application report for D8077 (id=30147)

Rebasing onto 4478a533a9...

Current branch diff-target is up to date.
Changes applied before test
commit 8b61c8dd17b448d40aef0325c29d460ef8104993
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/138/ for more details.

Build is green

Patch application report for D8077 (id=30149)

Rebasing onto 4478a533a9...

Current branch diff-target is up to date.
Changes applied before test
commit d749dc0b5a4e469b537e2ed34248c2e59e0535cb
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/139/ for more details.

Build has FAILED

Patch application report for D8077 (id=30259)

Rebasing onto 6b1c5797f7...

Current branch diff-target is up to date.
Changes applied before test
commit 4c738cb215639633605881130ea4c6d2c01f6803
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

Link to build: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/148/
See console output for more information: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/148/console

Build has FAILED

Patch application report for D8077 (id=30845)

Rebasing onto ffb88715d7...

Current branch diff-target is up to date.
Changes applied before test
commit b53dae99e0c33fc0fcfc7f97f91a0fd207316a8a
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

Link to build: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/207/
See console output for more information: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/207/console

Build has FAILED

Patch application report for D8077 (id=31006)

Rebasing onto 6e1c3e8ecf...

Current branch diff-target is up to date.
Changes applied before test
commit 4f869fd6786617434608257dea5c5ad4eb30cbd5
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

Link to build: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/211/
See console output for more information: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/211/console

Build has FAILED

Patch application report for D8077 (id=31369)

Rebasing onto bc6f6a7269...

Current branch diff-target is up to date.
Changes applied before test
commit 093e73a417f8aafbcabc00d1c0ebab0ffb1ec477
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

Link to build: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/215/
See console output for more information: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/215/console

Build has FAILED

Patch application report for D8077 (id=31597)

Rebasing onto 7522a848fa...

Current branch diff-target is up to date.
Changes applied before test
commit 9d092af54e66e14940b08397746b7c121b1da408
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

Link to build: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/226/
See console output for more information: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/226/console

Support query variables with the cost validator

Build is green

Patch application report for D8077 (id=31927)

Rebasing onto 7522a848fa...

Current branch diff-target is up to date.
Changes applied before test
commit 7e80795a92b23380bfd5659de94e9139988fa187
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/227/ for more details.

jayeshv retitled this revision from [WIP] Add a static query cost calculator to reject malicious quries to Add a static query cost calculator to reject malicious quries.Nov 18 2022, 3:01 PM

Build is green

Patch application report for D8077 (id=32119)

Rebasing onto c28b65feaf...

Current branch diff-target is up to date.
Changes applied before test
commit 1731ae2f51d9b4a9ad1691383abf5b2c7b3237f7
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/244/ for more details.

Build is green

Patch application report for D8077 (id=32297)

Rebasing onto d3aee2f9eb...

Current branch diff-target is up to date.
Changes applied before test
commit 79d62b250919d9c88537c783b9818d6b8dfc0bd2
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/254/ for more details.

add extra cost of directory entry path

Build is green

Patch application report for D8077 (id=32343)

Rebasing onto 5446d4f87f...

Current branch diff-target is up to date.
Changes applied before test
commit df5c4f2262a292073208319d900048b0439416af
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/261/ for more details.

Build is green

Patch application report for D8077 (id=32372)

Rebasing onto f6b5a2d5e5...

Current branch diff-target is up to date.
Changes applied before test
commit 0f1d344a5cc21375957ee511e81ba4dc9df36d6d
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/274/ for more details.

add cost for new entrypoints

Build is green

Patch application report for D8077 (id=32390)

Rebasing onto f2fae75b9c...

Current branch diff-target is up to date.
Changes applied before test
commit b9d9b5affdc679aa8db2887dac9df0246e9a05cb
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/283/ for more details.

Build is green

Patch application report for D8077 (id=32391)

Rebasing onto f2fae75b9c...

Current branch diff-target is up to date.
Changes applied before test
commit 5833b90bb4e2e681a8a5d9a698983eef2c705d2c
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/284/ for more details.

Build is green

Patch application report for D8077 (id=32393)

Rebasing onto f2fae75b9c...

Current branch diff-target is up to date.
Changes applied before test
commit 96b233023d2c565bd6b2ae48ba79517daf97cc69
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/285/ for more details.