Page MenuHomeSoftware Heritage

[WIP] :Make numeric cursors a bit more opaque to avoid cursor hacks
AbandonedPublic

Authored by jayeshv on Nov 30 2022, 12:13 PM.

Details

Reviewers
vlorentz
Group Reviewers
Reviewers
Maniphest Tasks
T4647: GraphQL: Make cursors truly opaque
Summary

WIP: Base64 encoded numeric cursors are easy to read. Add extra chars
to make them a bit more opaque.

Related to T4647

Diff Detail

Repository
rDGQL GraphQL API
Branch
opaque-cursor
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 33050
Build 51800: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 51799: arc lint + arc unit

Event Timeline

jayeshv retitled this revision from Make numeric cursors a bit more opaque to avoid cursor hacks to [WIP] Make numeric cursors a bit more opaque to avoid cursor hacks.Nov 30 2022, 12:14 PM
jayeshv edited the summary of this revision. (Show Details)

Build is green

Patch application report for D8906 (id=32096)

Rebasing onto c28b65feaf...

Current branch diff-target is up to date.
Changes applied before test
commit b9fea6dbe422a89d154c60c13b06d85ca1b04f03
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Wed Nov 30 12:04:43 2022 +0100

    Make numeric cursors a bit more opaque to avoid cursor hacks
    
    Base64 encoded numeric cursors are easy to read. Add extra chars
    to make them a bit more opaque.
    
    Related to T4647

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/237/ for more details.

Build is green

Patch application report for D8906 (id=32097)

Rebasing onto c28b65feaf...

Current branch diff-target is up to date.
Changes applied before test
commit 1b958cf2cfa65354f745e38cd511ada4da93e352
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Wed Nov 30 12:04:43 2022 +0100

    Make numeric cursors a bit more opaque to avoid cursor hacks
    
    Base64 encoded numeric cursors are easy to read. Add extra chars
    to make them a bit more opaque.
    
    Related to T4647

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/238/ for more details.

Build is green

Patch application report for D8906 (id=32098)

Rebasing onto c28b65feaf...

Current branch diff-target is up to date.
Changes applied before test
commit 460dfa577687cc8f0eef330982476fde5173b401
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Wed Nov 30 12:04:43 2022 +0100

    Make numeric cursors a bit more opaque to avoid cursor hacks
    
    Base64 encoded numeric cursors are easy to read. Add extra chars
    to make them a bit more opaque.
    
    Related to T4647

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/239/ for more details.

jayeshv retitled this revision from [WIP] Make numeric cursors a bit more opaque to avoid cursor hacks to Make numeric cursors a bit more opaque to avoid cursor hacks.Nov 30 2022, 12:26 PM
jayeshv edited the summary of this revision. (Show Details)
vlorentz added a subscriber: vlorentz.

.lstrip() and .rstrip() are not .removeprefix() and .removesuffix():

>>> get_decoded_cursor(get_encoded_cursor("FOO"))
'F'
This revision now requires changes to proceed.Nov 30 2022, 2:27 PM

.lstrip() and .rstrip() are not .removeprefix() and .removesuffix():

>>> get_decoded_cursor(get_encoded_cursor("FOO"))
'F'

Thanks, just fixed it :)
This fix looks somewhat like a hack to me. Do you think we should use some other method (maybe encrypting the cursor) to fix this issue?

Build has FAILED

Patch application report for D8906 (id=32099)

Rebasing onto c28b65feaf...

Current branch diff-target is up to date.
Changes applied before test
commit 3c28bc55d492fc91293ef35d6c18a4ad07da8d85
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Wed Nov 30 12:04:43 2022 +0100

    Make numeric cursors a bit more opaque to avoid cursor hacks
    
    Base64 encoded numeric cursors are easy to read. Add extra chars
    to make them a bit more opaque.
    
    Related to T4647

Link to build: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/240/
See console output for more information: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/240/console

Thanks, just fixed it :)

You can't use .removeprefix and .removesuffix because we still need to support Python 3.7. Also, the fix is not enough, because you need to change the encoding scheme, as it is still not injective:

>>> get_decoded_cursor(get_encoded_cursor("foo"))
'foo'
>>> get_decoded_cursor(get_encoded_cursor("TRKJJeGRfooJJEHDNOHT"))
'foo'

This fix looks somewhat like a hack to me. Do you think we should use some other method (maybe encrypting the cursor) to fix this issue?

Is it really needed, though? If yes, encrypting looks overkill. If you just want to make it hard to work with, append the (b64-encoded) output of zlib.crc32() or something

Thanks, just fixed it :)

You can't use .removeprefix and .removesuffix because we still need to support Python 3.7. Also, the fix is not enough, because you need to change the encoding scheme, as it is still not injective:

>>> get_decoded_cursor(get_encoded_cursor("foo"))
'foo'
>>> get_decoded_cursor(get_encoded_cursor("TRKJJeGRfooJJEHDNOHT"))
'foo'

This fix looks somewhat like a hack to me. Do you think we should use some other method (maybe encrypting the cursor) to fix this issue?

Is it really needed, though? If yes, encrypting looks overkill. If you just want to make it hard to work with, append the (b64-encoded) output of zlib.crc32() or something

I am not sure, @olasd felt a bit uncomfortable with easy to read cursors. Let me try some alternatives anyway.

jayeshv retitled this revision from Make numeric cursors a bit more opaque to avoid cursor hacks to [WIP] :Make numeric cursors a bit more opaque to avoid cursor hacks.Nov 30 2022, 3:16 PM
jayeshv edited the summary of this revision. (Show Details)

Build is green

Patch application report for D8906 (id=32100)

Rebasing onto c28b65feaf...

Current branch diff-target is up to date.
Changes applied before test
commit 13929c9e7050d9c11cd497e0fc2b8972402f3dab
Author: Jayesh Velayudhan <jayesh@softwareheritage.org>
Date:   Wed Nov 30 12:04:43 2022 +0100

    Make numeric cursors a bit more opaque to avoid cursor hacks
    
    Base64 encoded numeric cursors are easy to read. Add extra chars
    to make them a bit more opaque.
    
    Related to T4647

See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/241/ for more details.

So, overall, what I'm uncomfortable with is decoding incoming cursors without validating that we have output and generated them ourselves. They're currently untrusted user input, that we're decoding with a very simple algorithm, and it'd be easy enough for anyone to fiddle with them (even with the prefix/suffix addition).

I'm worried that we're opening ourselves up to some surprises when we implement cursors that will be more complex than numeric ids, and that some form of (cryptographic) authentification scheme would close that door once and for all.

Maybe I'm worried for nothing. One thing I'm certain of is that we shouldn't *obfuscate* the cursors, either they're cryptographically authenticated or we make a conscious choice of leaving the cursors raw and ignoring the problem.

If we decide to go with a cursor signature scheme, then it should be:

  • based on cryptographic primitives
  • using a secret seed to generate the signing key
  • and cursors for different queries should combine the seed with a per-query namespace
  • ideally it would support secret rotation

Django's signing primitives are interesting, their main caveat in my opinion is that the signatures are pretty long. I think it uses itsdangerous under the hood, which itself implements a robust HMAC scheme that can be salted as needed (and supports key rotation as well).

Dropping this revision. A better signature based cursor can be implemented later.