Page MenuHomeSoftware Heritage

[WIP] In case of race condition in content_add, raise SerializationFailure instead of HashCollision.
Needs ReviewPublic

Authored by vlorentz on Nov 11 2019, 12:08 AM.

Details

Summary

Partially resolves T2019, by raising the appropriate exception.

This keeps the retry logic on the client side. I can add it on the server side with a decorator to both functions, but it won't be pretty because it needs to create a new cursor, so the test won't work as-is.

Diff Detail

Repository
rDSTO Storage manager
Branch
T2019-SerializationFailure
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 8931
Build 13044: tox-on-jenkinsJenkins
Build 13043: arc lint + arc unit

Event Timeline

vlorentz created this revision.Nov 11 2019, 12:08 AM
vlorentz updated this revision to Diff 7754.Nov 11 2019, 12:09 AM

remove irrelevant TODO

vlorentz edited the summary of this revision. (Show Details)Nov 11 2019, 12:13 AM
vlorentz retitled this revision from In case of race condition in content_add, raise SerializationFailure instead of HashCollision. to [WIP] In case of race condition in content_add, raise SerializationFailure instead of HashCollision..Nov 11 2019, 9:47 AM
vlorentz added a comment.EditedNov 12 2019, 12:23 PM

Actually, I don't understand why the new SELECT is needed; the one in content_missing_from_list (called by content_missing, indirectly called by content_add) should have the same effect...

It looks like what matters is to SELECT only based on sha1 instead of all hashes. Weird...

olasd added a subscriber: olasd.Nov 12 2019, 2:19 PM

Thanks for looking into this.

Forcing a serialization issue instead of a HashCollision is interesting, however...

I don't like that this tweaks the transaction isolation level within the function rather than when we open the transaction: set transaction isolation level must be the first thing called within the transaction, before any queries happen, so this should be handled within the transaction wrapper.

Having to do a bogus select to generate a fake read/write serialization dependency is also quite awful.

While I _think_ serializable is the isolation level that we would want for all queries, I don't know if we really want to eat that cost, for the benefit of "absolute correctness" on one query that fails 1 out of a million times and that we already retry on the client side anyway.

If we really want to go that route, we should also implement a read-only flag, to avoid read queries gratuitously generating serialization dependencies. In addition, the fact that the content_add transactions also include addition to the object storage doesn't play well with the caveats listed in the postgres docs.

Is there anything I'm missing on the status quo on this issue?

This are all good points. They don't seem to hard to address, I just wonder if they are worth spending time on.

Also I don't understand why the extra SELECT changes anything, and I'm hesitant to push a fix I don't fully understand