Page MenuHomeSoftware Heritage

Remove SQLisms from the tests and API.
ClosedPublic

Authored by vlorentz on Nov 9 2018, 12:06 PM.

Active Operations

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vlorentz retitled this revision from Remove SQLisms from the tests. to Remove SQLisms from the tests and API..Nov 9 2018, 12:06 PM
olasd requested changes to this revision.Nov 9 2018, 2:03 PM
olasd added a subscriber: olasd.

Thanks, these changes look fine save for a few comments inline

swh/storage/storage.py
163

Can we extract the key that has collided? The diag attribute on the exception should have the info

http://initd.org/psycopg/docs/extensions.html#psycopg2.extensions.Diagnostics

1216–1217

The keys should match what's in the stored procedure for stat_counters (so, content, skipped_content, directory, revision, release, snapshot, origin, origin_visit, and maybe person; directory_entry_dir was removed because it's meaningless)

swh/storage/tests/test_storage.py
571

Also check that the key got extracted ?

This revision now requires changes to proceed.Nov 9 2018, 2:03 PM
vlorentz marked an inline comment as done.
  • Fix column names in refresh_stat_counters.
swh/storage/storage.py
163

The best info I can get from that attribute is the message_detail, which is 'Key (sha1)=(\\x34973274ccef6ab4dfaaf86599792fa9c3fe4689) already exists.'. I could parse it, but eww

swh/storage/storage.py
163

You can check:

e.diag.sqlstate == '23505' (unique constraint failure - documented on https://www.postgresql.org/docs/10/errcodes-appendix.html)
e.diag.table_name == 'content'
and use the e.diag.constraint_name attribute as a mapping:

  • content_pkey -> (currently) duplicate sha1
  • content_XXX_idx -> duplicate XXX (with XXX in sha1_git, sha256)

We don't enforce distinct blake2s256 hashes (we should also probably drop the sha256 enforcement).

Anything else should bubble up as the original IntegrityError exception.

vlorentz added inline comments.
swh/storage/storage.py
163

content_XXX_idx -> duplicate XXX (with XXX in sha1_git, sha256)

grep "_idx" swh/storage/sql/*.sql shows no result.

vlorentz added inline comments.
swh/storage/storage.py
163

nvm, I misunderstood

  • Give the name of the colliding hash when raising HashCollision.
olasd added inline comments.
swh/storage/storage.py
172

missing an else: raise ?

This revision is now accepted and ready to land.Nov 14 2018, 3:54 PM
  • Don't eat all exceptions.
This revision was automatically updated to reflect the committed changes.