Page MenuHomeSoftware Heritage

Add a static query cost calculator to reject malicious quries
Needs ReviewPublic

Authored by jayeshv on Jul 4 2022, 3:34 PM.

Details

Reviewers
None
Group Reviewers
Reviewers
Maniphest Tasks
T4299: GraphQL: Static query validation and max cost limiting
Summary

Calculate the total query cost from the query AST. This will block big queries
before executing them.

Related to T4299

Diff Detail

Repository
rDGQL GraphQL API
Branch
query-cost-8077
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 33072
Build 51839: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 51838: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8077 (id=29149)

Rebasing onto e4d4a30b1f...

Current branch diff-target is up to date.
Changes applied before test
commit 08452545334fb7cb22dbb92f58bad184f2609951
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Jul 4 10:41:16 2022 +0200

    Add basic rate limiting
    
    Add ariadne's deafult rate limiter and query validator.
    Complexity set only for query fields
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/82/ for more details.

Build is green

Patch application report for D8077 (id=29150)

Rebasing onto e4d4a30b1f...

Current branch diff-target is up to date.
Changes applied before test
commit c640de26081cdf64bcd55d826e84fa254bfd8d94
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Jul 4 10:41:16 2022 +0200

    Add basic rate limiting
    
    Add ariadne's deafult rate limiter and query validator.
    Complexity set only for query fields
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/83/ for more details.

Remove multiplier for connections; set complexity to 2

Build is green

Patch application report for D8077 (id=29151)

Rebasing onto e4d4a30b1f...

Current branch diff-target is up to date.
Changes applied before test
commit dbcb1710da6bf73004bf88eca89dbfa90597f669
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Jul 4 10:41:16 2022 +0200

    Add basic rate limiting for quries
    
    Use default rate limiter from ariadne
    Set complexity 2 for connections and 1 for a node
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/84/ for more details.

jayeshv retitled this revision from WIP: Add basic rate limiting to Add basic rate limiting.Jul 4 2022, 7:22 PM

Build is green

Patch application report for D8077 (id=29153)

Rebasing onto e4d4a30b1f...

Current branch diff-target is up to date.
Changes applied before test
commit a4c64c65da06e5dba71f31cd37e25d17062e67d0
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Jul 4 10:41:16 2022 +0200

    Add basic rate limiting for quries
    
    Use default rate limiter from ariadne
    Set complexity 2 for connections and 1 for a node
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/85/ for more details.

jayeshv retitled this revision from Add basic rate limiting to Add a basic query cost validator.Jul 5 2022, 6:44 AM
jayeshv edited the summary of this revision. (Show Details)

Build is green

Patch application report for D8077 (id=29161)

Rebasing onto e4d4a30b1f...

Current branch diff-target is up to date.
Changes applied before test
commit f5ded23f97c8e912e2aa5e6dfa48cbcbe636b5f6
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Jul 4 10:41:16 2022 +0200

    Add a basic query cost validator
    
    Use default query validator from ariadne
    Add a cost of 2 for a connections and 1 for a node
    Reject the query from executing if the total cost is
    greater than the max limit (set to 200 at the moment).
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/86/ for more details.

Build is green

Patch application report for D8077 (id=29162)

Rebasing onto e4d4a30b1f...

Current branch diff-target is up to date.
Changes applied before test
commit f69aae62ed1c6d1c1ac6cddf5023b0f981f7e58a
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Jul 4 10:41:16 2022 +0200

    Add a basic query cost validator
    
    Use default query validator from ariadne
    Add a cost of 2 for a connections and 1 for a node
    Reject the query from executing if the total cost is
    greater than the max limit (set to 200 at the moment).
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/87/ for more details.

These @cost directives will be ignored by parsers that don't support it, right?

These @cost directives will be ignored by parsers that don't support it, right?

These @cost directives will be ignored by parsers that don't support it, right?

It is an add on. A parser must ignore any unknown directive and the schema will still be valid.

An issue I have is with calculating cost for connections. As of now, I have hard coded that to 2 for every connection. But I don't really like this.

My initial ideas was to calculate cost by multiplying the value of 'first' (number of nodes requested) with the unit node cost (1)
But this doesn't make sense as a connection query is not 'n' times complex to retrieve (it is a constant in most cases). Do you have some thoughts here?

An issue I have is with calculating cost for connections. As of now, I have hard coded that to 2 for every connection. But I don't really like this.

Why?

My initial ideas was to calculate cost by multiplying the value of 'first' (number of nodes requested) with the unit node cost (1)
But this doesn't make sense as a connection query is not 'n' times complex to retrieve (it is a constant in most cases). Do you have some thoughts here?

Isn't it how GitHub does it, though? (Not that I like it...)

An issue I have is with calculating cost for connections. As of now, I have hard coded that to 2 for every connection. But I don't really like this.

Why?

It is a bit unfair, if you request 5 nodes or 50 nodes, it costs you the same. No incentive for retrieving minimal data.

My initial ideas was to calculate cost by multiplying the value of 'first' (number of nodes requested) with the unit node cost (1)
But this doesn't make sense as a connection query is not 'n' times complex to retrieve (it is a constant in most cases). Do you have some thoughts here?

Isn't it how GitHub does it, though? (Not that I like it...)

Yes, one idea is to have big numbers. say, 1000 for a single node and 100 for one node in connection. It is not perfect either.
Maybe that is a possible future improvement.

An issue I have is with calculating cost for connections. As of now, I have hard coded that to 2 for every connection. But I don't really like this.

Why?

It is a bit unfair, if you request 5 nodes or 50 nodes, it costs you the same. No incentive for retrieving minimal data.

But there is also a per-node cost of 1, so that's the incensentive, right?

But there is also a per-node cost of 1, so that's the incensentive, right?

No, every connection, irrespective of the number of nodes we retrieve will cost 2 units.
Every direct node retrieval (say get an origin using URL) will cost 1 unit.

For every node in a connection to cost 1 unit, we have to use a multiplier (that was my original idea). The issue here is,
reverting n nodes in a connection will cost n units and 2*n will cost double.
Isn't it too bad?. From a backed point of view retrieving n or 2*n nodes will cost almost the same resources.

Where is this "2*n" coming from? Is this by assuming every connection multiplies by a factor of 2?

If so, why have this factor at all?

Oh wait I get it. Doing a connection with n nodes costs c*n (for a constant c you implicitly decided to be 1), then fetching each of these nodes also costs n, so it's c*n+n.

Yeah that makes sense to me.

Oh wait I get it. Doing a connection with n nodes costs c*n (for a constant c you implicitly decided to be 1), then fetching each of these nodes also costs n, so it's c*n+n.

Yeah that makes sense to me.

Yes, I am trying to add some functional tests to make it a bit clearer.

vlorentz retitled this revision from Add a basic query cost validator to [WIP] Add a basic query cost validator.Aug 10 2022, 11:33 AM
jayeshv retitled this revision from [WIP] Add a basic query cost validator to [WIP] Add a static query cost calculator to reject malicious quries.Aug 30 2022, 3:23 PM
jayeshv edited the summary of this revision. (Show Details)

Build has FAILED

Patch application report for D8077 (id=30145)

Rebasing onto 4478a533a9...

Current branch diff-target is up to date.
Changes applied before test
commit 5b7e79d29412c70663f4658f17ad3ee3c1a64af8
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

Link to build: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/137/
See console output for more information: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/137/console

Build is green

Patch application report for D8077 (id=30147)

Rebasing onto 4478a533a9...

Current branch diff-target is up to date.
Changes applied before test
commit 8b61c8dd17b448d40aef0325c29d460ef8104993
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/138/ for more details.

Build is green

Patch application report for D8077 (id=30149)

Rebasing onto 4478a533a9...

Current branch diff-target is up to date.
Changes applied before test
commit d749dc0b5a4e469b537e2ed34248c2e59e0535cb
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/139/ for more details.

Build has FAILED

Patch application report for D8077 (id=30259)

Rebasing onto 6b1c5797f7...

Current branch diff-target is up to date.
Changes applied before test
commit 4c738cb215639633605881130ea4c6d2c01f6803
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

Link to build: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/148/
See console output for more information: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/148/console

Build has FAILED

Patch application report for D8077 (id=30845)

Rebasing onto ffb88715d7...

Current branch diff-target is up to date.
Changes applied before test
commit b53dae99e0c33fc0fcfc7f97f91a0fd207316a8a
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

Link to build: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/207/
See console output for more information: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/207/console

Build has FAILED

Patch application report for D8077 (id=31006)

Rebasing onto 6e1c3e8ecf...

Current branch diff-target is up to date.
Changes applied before test
commit 4f869fd6786617434608257dea5c5ad4eb30cbd5
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

Link to build: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/211/
See console output for more information: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/211/console

Build has FAILED

Patch application report for D8077 (id=31369)

Rebasing onto bc6f6a7269...

Current branch diff-target is up to date.
Changes applied before test
commit 093e73a417f8aafbcabc00d1c0ebab0ffb1ec477
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

Link to build: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/215/
See console output for more information: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/215/console

Build has FAILED

Patch application report for D8077 (id=31597)

Rebasing onto 7522a848fa...

Current branch diff-target is up to date.
Changes applied before test
commit 9d092af54e66e14940b08397746b7c121b1da408
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

Link to build: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/226/
See console output for more information: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/226/console

Support query variables with the cost validator

Build is green

Patch application report for D8077 (id=31927)

Rebasing onto 7522a848fa...

Current branch diff-target is up to date.
Changes applied before test
commit 7e80795a92b23380bfd5659de94e9139988fa187
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/227/ for more details.

jayeshv retitled this revision from [WIP] Add a static query cost calculator to reject malicious quries to Add a static query cost calculator to reject malicious quries.Fri, Nov 18, 3:01 PM

Build is green

Patch application report for D8077 (id=32119)

Rebasing onto c28b65feaf...

Current branch diff-target is up to date.
Changes applied before test
commit 1731ae2f51d9b4a9ad1691383abf5b2c7b3237f7
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Mon Aug 29 13:37:18 2022 +0200

    Add a static query cost calculator to reject malicious quries
    
    Calculate the total query cost from the query AST. This will block big queries
    before executing them.
    
    Related to T4299

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/244/ for more details.