WIP: Base64 encoded numeric cursors are easy to read. Add extra chars
to make them a bit more opaque.
Related to T4647
Differential D8906
[WIP] :Make numeric cursors a bit more opaque to avoid cursor hacks jayeshv on Nov 30 2022, 12:13 PM. Authored by
Details
WIP: Base64 encoded numeric cursors are easy to read. Add extra chars Related to T4647
Diff Detail
Event TimelineComment Actions Build is green Patch application report for D8906 (id=32096)Rebasing onto c28b65feaf... Current branch diff-target is up to date. Changes applied before testcommit b9fea6dbe422a89d154c60c13b06d85ca1b04f03 Author: Jayesh Velayudhan <jayesh@softwareheritage.org> Date: Wed Nov 30 12:04:43 2022 +0100 Make numeric cursors a bit more opaque to avoid cursor hacks Base64 encoded numeric cursors are easy to read. Add extra chars to make them a bit more opaque. Related to T4647 See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/237/ for more details. Comment Actions Build is green Patch application report for D8906 (id=32097)Rebasing onto c28b65feaf... Current branch diff-target is up to date. Changes applied before testcommit 1b958cf2cfa65354f745e38cd511ada4da93e352 Author: Jayesh Velayudhan <jayesh@softwareheritage.org> Date: Wed Nov 30 12:04:43 2022 +0100 Make numeric cursors a bit more opaque to avoid cursor hacks Base64 encoded numeric cursors are easy to read. Add extra chars to make them a bit more opaque. Related to T4647 See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/238/ for more details. Comment Actions Build is green Patch application report for D8906 (id=32098)Rebasing onto c28b65feaf... Current branch diff-target is up to date. Changes applied before testcommit 460dfa577687cc8f0eef330982476fde5173b401 Author: Jayesh Velayudhan <jayesh@softwareheritage.org> Date: Wed Nov 30 12:04:43 2022 +0100 Make numeric cursors a bit more opaque to avoid cursor hacks Base64 encoded numeric cursors are easy to read. Add extra chars to make them a bit more opaque. Related to T4647 See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/239/ for more details. Comment Actions .lstrip() and .rstrip() are not .removeprefix() and .removesuffix(): >>> get_decoded_cursor(get_encoded_cursor("FOO")) 'F' Comment Actions Thanks, just fixed it :) Comment Actions Build has FAILED Patch application report for D8906 (id=32099)Rebasing onto c28b65feaf... Current branch diff-target is up to date. Changes applied before testcommit 3c28bc55d492fc91293ef35d6c18a4ad07da8d85 Author: Jayesh Velayudhan <jayesh@softwareheritage.org> Date: Wed Nov 30 12:04:43 2022 +0100 Make numeric cursors a bit more opaque to avoid cursor hacks Base64 encoded numeric cursors are easy to read. Add extra chars to make them a bit more opaque. Related to T4647 Link to build: https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/240/ Comment Actions You can't use .removeprefix and .removesuffix because we still need to support Python 3.7. Also, the fix is not enough, because you need to change the encoding scheme, as it is still not injective: >>> get_decoded_cursor(get_encoded_cursor("foo")) 'foo' >>> get_decoded_cursor(get_encoded_cursor("TRKJJeGRfooJJEHDNOHT")) 'foo'
Is it really needed, though? If yes, encrypting looks overkill. If you just want to make it hard to work with, append the (b64-encoded) output of zlib.crc32() or something Comment Actions I am not sure, @olasd felt a bit uncomfortable with easy to read cursors. Let me try some alternatives anyway. Comment Actions Build is green Patch application report for D8906 (id=32100)Rebasing onto c28b65feaf... Current branch diff-target is up to date. Changes applied before testcommit 13929c9e7050d9c11cd497e0fc2b8972402f3dab Author: Jayesh Velayudhan <jayesh@softwareheritage.org> Date: Wed Nov 30 12:04:43 2022 +0100 Make numeric cursors a bit more opaque to avoid cursor hacks Base64 encoded numeric cursors are easy to read. Add extra chars to make them a bit more opaque. Related to T4647 See https://jenkins.softwareheritage.org/job/DGQL/job/tests-on-diff/241/ for more details. Comment Actions So, overall, what I'm uncomfortable with is decoding incoming cursors without validating that we have output and generated them ourselves. They're currently untrusted user input, that we're decoding with a very simple algorithm, and it'd be easy enough for anyone to fiddle with them (even with the prefix/suffix addition). I'm worried that we're opening ourselves up to some surprises when we implement cursors that will be more complex than numeric ids, and that some form of (cryptographic) authentification scheme would close that door once and for all. Maybe I'm worried for nothing. One thing I'm certain of is that we shouldn't *obfuscate* the cursors, either they're cryptographically authenticated or we make a conscious choice of leaving the cursors raw and ignoring the problem. If we decide to go with a cursor signature scheme, then it should be:
Django's signing primitives are interesting, their main caveat in my opinion is that the signatures are pretty long. I think it uses itsdangerous under the hood, which itself implements a robust HMAC scheme that can be salted as needed (and supports key rotation as well). |