Page MenuHomeSoftware Heritage

indexer: serialization issue on latest deployed indexer
Closed, MigratedEdits Locked

Description

There is still a serialization issue.

[2018-10-25 19:23:28,670: ERROR/MainProcess] Task swh.indexer.tasks.OrchestratorAllContents[410b4fe2-f036-421c-a4fa-7d2bdee94feb] raised unexpected: EncodeError(TypeError(TypeError('<GroupResult: c2d43b89-eda2-47f2-9af4-9e81cb8d8c17 [2c339c3d-2d47-4ab6-92c6-67ecefe9c171]> is not JSON serializable',),),)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/swh/scheduler/task.py", line 163, in run
    self.send_event('task-result', result=result)
  File "/usr/lib/python3/dist-packages/celery/app/task.py", line 803, in send_event
    return d.send(type_, uuid=req.id, **fields)
  File "/usr/lib/python3/dist-packages/celery/events/__init__.py", line 238, in send
    self.publish(type, fields, self.producer, blind)
  File "/usr/lib/python3/dist-packages/celery/events/__init__.py", line 215, in publish
    headers=self.headers,
  File "/usr/lib/python3/dist-packages/kombu/messaging.py", line 165, in publish
    compression, headers)
  File "/usr/lib/python3/dist-packages/kombu/messaging.py", line 241, in _prepare
    body) = dumps(body, serializer=serializer)
  File "/usr/lib/python3/dist-packages/kombu/serialization.py", line 164, in dumps
    payload = encoder(data)
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python3/dist-packages/kombu/serialization.py", line 59, in _reraise_errors
    reraise(wrapper, wrapper(exc), sys.exc_info()[2])
  File "/usr/lib/python3/dist-packages/kombu/five.py", line 131, in reraise
    raise value.with_traceback(tb)
  File "/usr/lib/python3/dist-packages/kombu/serialization.py", line 55, in _reraise_errors
    yield
  File "/usr/lib/python3/dist-packages/kombu/serialization.py", line 164, in dumps
    payload = encoder(data)
  File "/usr/lib/python3/dist-packages/anyjson/__init__.py", line 141, in dumps
    return implementation.dumps(value)
  File "/usr/lib/python3/dist-packages/anyjson/__init__.py", line 89, in dumps
    raise TypeError(TypeError(*exc.args)).with_traceback(sys.exc_info()[2])
  File "/usr/lib/python3/dist-packages/anyjson/__init__.py", line 87, in dumps
    return self._encode(data)
  File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 380, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3/dist-packages/simplejson/encoder.py", line 291, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3/dist-packages/simplejson/encoder.py", line 373, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3/dist-packages/simplejson/encoder.py", line 268, in default
    raise TypeError(repr(o) + " is not JSON serializable")
kombu.exceptions.EncodeError: <GroupResult: c2d43b89-eda2-47f2-9af4-9e81cb8d8c17 [2c339c3d-2d47-4ab6-92c6-67ecefe9c171]> is not JSON serializable

dashboard: http://kibana0.internal.softwareheritage.org:5601/app/kibana#/dashboard/289ce780-d88d-11e8-b8ce-cf95f437ce37

Related: P325

Event Timeline

ardumont triaged this task as Normal priority.Oct 25 2018, 9:37 PM
ardumont created this task.
ardumont updated the task description. (Show Details)

This was fixed

Supposed to, yes.
I'm missing something somewhere though.

I'm missing something somewhere though.

To be clear, i checked both the scheduler instance (saatchi) and the worker involved (worker01.euwest.azure) already yesterday.
They both have the right version of swh-scheduler and swh-indexer.

$ ardumont@saatchi:~% dpkg -l python3-swh.indexer | grep python3-swh
ii  python3-swh.indexer 0.0.54-1~bpo9~swh+1 all          Software Heritage Content Indexer
$ ardumont@saatchi:~% python3
Python 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from swh.scheduler.celery_backend import config; config.app.conf.get('CELERY_RESULT_SERIALIZER')
'msgpack'
>>> import celery; celery.__version__
'3.1.23'

And the worker involved:

ardumont@worker01:~% dpkg -l python3-swh.indexer | grep python3-swh
ii  python3-swh.indexer 0.0.54-1~bpo9~swh+1 all          Software Heritage Content Indexer
ardumont@worker01:~% python3
Python 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from swh.scheduler.celery_backend import config; config.app.conf.get('CELERY_RESULT_SERIALIZER')
'msgpack'
>>> # ~> that draws the right dependency
>>> import celery; celery.__version__
'3.1.23'

The hole in reasoning might be Celery which remained at 3.1.23.
Indeed i did not specify any celery version bump.
I'm unclear on whether the runtime needs that later version or not.
So far i thought we needed that for the indexer tests...
That'd be the next point to check this morning.

Cheers,

Alternatively, you can wait for me to fix T1290 (I'll do it this morning). My plan is to remove the need for result serialization.