Page MenuHomeSoftware Heritage

Journal error in production
Closed, ResolvedPublic


Each time I submit a save request for the cpython repository in production, the following error is reported:

[2019-12-10 14:24:24,047: ERROR/ForkPoolWorker-12505] Loading failure, updating to `partial` status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/swh/loader/core/", line 872, in load
  File "/usr/lib/python3/dist-packages/swh/loader/core/", line 979, in store_data
  File "/usr/lib/python3/dist-packages/", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/usr/lib/python3/dist-packages/", line 206, in call
    return attempt.get(self._wrap_exception)
  File "/usr/lib/python3/dist-packages/", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/lib/python3/dist-packages/", line 693, in reraise
    raise value
  File "/usr/lib/python3/dist-packages/", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/usr/lib/python3/dist-packages/swh/loader/core/", line 490, in send_snapshot[snapshot])
  File "/usr/lib/python3/dist-packages/swh/storage/api/", line 97, in snapshot_add
    return'snapshot/add', {'snapshots': snapshots})
  File "/usr/lib/python3/dist-packages/swh/core/api/", line 206, in post
    return self._decode_response(response)
  File "/usr/lib/python3/dist-packages/swh/core/api/", line 238, in _decode_response
    raise pickle.loads(decode_response(response))
_pickle.PicklingError: Can't pickle : import of module 'cimpl' failed

This seems related to the journal.

The real error seems not correctly reported due to that pickling issue in confluent-kafka:

Event Timeline

anlambert triaged this task as Normal priority.Dec 11 2019, 11:49 AM
anlambert created this task.
olasd added a subscriber: olasd.Dec 11 2019, 2:39 PM

The backend exception is:
cimpl.KafkaException: KafkaError{code=MSG_SIZE_TOO_LARGE,val=10,str="Unable to produce message: Broker: Message size too large"}.

I'll fix the deployment to bump the limit up (the default limit is a bit low for some large objects like directories or snapshots).

olasd closed this task as Resolved.Wed, Jun 17, 2:25 PM
olasd claimed this task.

I suspect this hasn't happened recently