Page MenuHomeSoftware Heritage

race condition during concurrent loading of the same objects from multiple origins
Closed, MigratedEdits Locked


Looking through kibana logs, we found the following error happening quite often (in the storage):

[2019-08-20 00:39:25,728: ERROR/ForkPoolWorker-88373] Loading failure, updating to `partial` status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/swh/loader/core/", line 896, in load
  File "/usr/lib/python3/dist-packages/swh/loader/core/", line 1003, in store_data
  File "/usr/lib/python3/dist-packages/swh/loader/core/", line 649, in send_batch_contents
  File "/usr/lib/python3/dist-packages/swh/loader/core/", line 41, in send_in_packets
  File "/usr/lib/python3/dist-packages/", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/usr/lib/python3/dist-packages/", line 206, in call
    return attempt.get(self._wrap_exception)
  File "/usr/lib/python3/dist-packages/", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/lib/python3/dist-packages/", line 686, in reraise
    raise value
  File "/usr/lib/python3/dist-packages/", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/usr/lib/python3/dist-packages/swh/loader/core/", line 400, in send_contents
    result =
  File "/usr/lib/python3/dist-packages/swh/storage/api/", line 24, in content_add
    return'content/add', {'content': content})
  File "/usr/lib/python3/dist-packages/swh/core/api/", line 198, in post
    return self._decode_response(response)
  File "/usr/lib/python3/dist-packages/swh/core/api/", line 230, in _decode_response
    raise pickle.loads(decode_response(response)) sha1


root@uffizi:~# zgrep -c "" /var/log/syslog.*

Event Timeline

ardumont created this task.
ardumont created this object in space Restricted Space.
ardumont created this object with visibility "Developers (Project)".
olasd shifted this object from the Restricted Space space to the S1 Public space.Sep 30 2019, 1:32 PM
olasd changed the visibility from "Developers (Project)" to "Public (No Login Required)".

This is a race condition that happens when two different workers are loading the exact same content in parallel transactions.

I've added a diff with a minimal reproducer.

zack renamed this task from Investigate hash collision error to race condition during concurrent loading of the same objects from multiple origins.Oct 1 2019, 10:58 AM

tagged and deployed (loaders are mostly restarted or in progress)

ardumont claimed this task.

This can be closed now thanks to D2977.