Page MenuHomeSoftware Heritage

storage: Add proxy storage with retry policy
ClosedPublic

Authored by ardumont on Jan 9 2020, 5:27 PM.

Details

Summary

This implements the actual behavior the loader-core (dvcs ones) exposes. This
will allow to configure other loaders with the same retry (currently limited to
content_add here for discussion purposes [1]).

This should allow to:

  • take care of issues like [2] (hash collision in package loaders otherwise taken care of in dvcs loaders)
  • clean up some more the current loader-core (original loader-core wraps the storage's entrypoints with send_{content,origin,...} methods which adds that retry policy) [3]

[1] I will complete the implementation if we agree to continue in that
direction

[2] https://sentry.softwareheritage.org/share/issue/d4f1208b7eec4b43b11e38494ff039cc/

[3] https://forge.softwareheritage.org/source/swh-loader-core/browse/master/swh/loader/core/loader.py$331-361

Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.Jan 9 2020, 5:27 PM
ardumont edited the summary of this revision. (Show Details)Jan 9 2020, 5:30 PM
vlorentz requested changes to this revision.Jan 9 2020, 5:52 PM
vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/storage/__init__.py
18–19

nitpick: for consistency, it should be a substantive, like "retryer" (which is a neologism, for lack of a better word)

63

same

swh/storage/retrying.py
20 ↗(On Diff #8923)

the name should indicate it's a boolean predicate. eg. should_retry_loading

25–29 ↗(On Diff #8923)

This should be a module-level constant, and the name (and comment) should describe what the list is used for. eg. RETRY_EXCEPTIONS.

27 ↗(On Diff #8923)

Why retry on IntegrityError?

31–32 ↗(On Diff #8923)

For some reason, this conditional is twisting my brain. Could you add a comment?

Or remove the not to switch the two branches

38–42 ↗(On Diff #8923)

Can't traceback.format_exc be used here?

59–61 ↗(On Diff #8923)

and content_metadata_add

swh/storage/tests/test_retrying.py
34–58 ↗(On Diff #8923)

unittest.mock has a neat feature to remove the need for the _count_content_add function:

def test_retrying_proxy_storage_with_retry(sample_data, mocker):
    """Multiple retries for hash collision and psycopg2 error

    """
    mock_memory = mocker.patch('swh.storage.in_memory.Storage.content_add')
    mock_memory.side_effect = [
        HashCollision('Refuse to add content'),
        psycopg2.IntegrityError('Refuse differently to add content'),
        {'content:add': 1}
    ]

Example:

>>> m = unittest.mock.Mock(side_effect=[Exception('foo'), Exception('bar'), 'baz'])
>>> m()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/unittest/mock.py", line 960, in __call__
    return _mock_self._mock_call(*args, **kwargs)
  File "/usr/lib/python3.7/unittest/mock.py", line 1024, in _mock_call
    raise result
Exception: foo
>>> m()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/unittest/mock.py", line 960, in __call__
    return _mock_self._mock_call(*args, **kwargs)
  File "/usr/lib/python3.7/unittest/mock.py", line 1024, in _mock_call
    raise result
Exception: bar
>>> m()
'baz'

;)

This revision now requires changes to proceed.Jan 9 2020, 5:52 PM
ardumont added inline comments.Jan 10 2020, 10:00 AM
swh/storage/__init__.py
18–19

hum, i found this ugly.

Plus, we do not really see that consistency (among all storages i mean).

swh/storage/retrying.py
20 ↗(On Diff #8923)

ok, although, here it should even be named should_retry_adding, it was initially from the loader so loading made sense there.

27 ↗(On Diff #8923)

I don't remember what lead to this.

Here is the original function though (loader-core's loader, slightly adapted) [1]

[1] https://forge.softwareheritage.org/source/swh-loader-core/browse/master/swh/loader/core/loader.py$24-50

38–42 ↗(On Diff #8923)

Maybe, i don't really know ;)

59–61 ↗(On Diff #8923)

yes, like i said in the description, the implementation is incomplete for now.

I will complete the other _add endpoint (and the tests) if we agree to integrate this.

I did not want to spend too much time on adding all endpoints (and the gazillion tests attached to it).

swh/storage/tests/test_retrying.py
34–58 ↗(On Diff #8923)

Oh yeah, i forgot.

That's awesome! Thanks!

ardumont added inline comments.Jan 10 2020, 10:16 AM
swh/storage/retrying.py
27 ↗(On Diff #8923)

Ah, but as the comment says it all.
It could so happens that 2 workers load the same repository mostly at the same time (for some reason).
It happened in the past at least.

That could make concurrency transaction issues (because the other transaction got committed faster and then you have integrity failure on content for example, content already inserted and what not).

Retrying a new transaction should make the issue disappear as the _add methods are filtering data prior to inserting them (at least for the dag objects iirc).

ardumont updated this revision to Diff 8924.Jan 10 2020, 10:59 AM

Adapt according to review:

  • Rename should_retry_adding function
  • Extract exception that triggers retry into a module variable
  • Improve tests using rightfully the .side_effect method
  • Add tests to should_retry_adding function
anlambert added inline comments.
swh/storage/retry.py
23

I think we should only catch HashCollision exceptions here in order to not be tightly linked to a specific storage backend (here PostgreSQL).

At the moment, we want to retry on add operations when two different workers concurrently write the same object (content, directory, release, snapshot, revision) in a table.

Nevertheless, it means that some adaptation needs to be made in storage backend implementations as currently the HashCollision exception is only raised for the content add case.

ardumont added inline comments.Jan 10 2020, 1:06 PM
swh/storage/__init__.py
18–19

Also, see retry as a verb like buffer and filter are, maybe that will help.

ardumont added inline comments.Jan 10 2020, 1:10 PM
swh/storage/retry.py
23

For other dag objects, I'm not sure HashCollision convey the right information
though. Maybe a new exception DuplicatedEntry or something.

In any case, that should go in another diff.

Otherwise, this one will be a tag big.

vlorentz accepted this revision.Jan 10 2020, 2:32 PM
This revision is now accepted and ready to land.Jan 10 2020, 2:32 PM
ardumont updated this revision to Diff 8929.Jan 10 2020, 2:40 PM

Improve docstrings and sentences

ardumont updated this revision to Diff 8930.Jan 10 2020, 2:42 PM

Plug to master branch

This revision was automatically updated to reflect the committed changes.