Page MenuHomeSoftware Heritage

core.lister_base: Ensure deterministic _task_key return value

Authored by anlambert on May 15 2019, 3:42 PM.



While playing with the Phabricator lister, I sometimes encoutered the following error:

$ python3 
2019-05-15 15:29:42,959 DEBUG swh.lister.core.lister_base Loading config from lister_phabricator
2019-05-15 15:29:42,960 INFO swh.core.config Loading config file /home/antoine/.config/swh/lister_phabricator.yml
2019-05-15 15:29:42,962 DEBUG swh.lister.core.lister_base <swh.lister.phabricator.lister.PhabricatorLister object at 0x7f747ce698d0> CONFIG={'lister': {'cls': 'local', 'args': {'db': 'postgresql:///lister-phabricator'}}, 'content_size_limit': 104857600, 'log_db': 'dbname=softwareheritage-log', 'cache_responses': False, 'scheduler': {'cls': 'remote', 'args': {'url': 'http://localhost:5008/'}}, 'cache_dir': '/home/antoine/.cache/swh/lister/phabricator', 'storage': {'cls': 'remote', 'args': {'url': 'http://localhost:5002/'}}, 'credentials': []}
2019-05-15 15:29:42,980 DEBUG urllib3.connectionpool Starting new HTTPS connection (1):
2019-05-15 15:29:43,325 DEBUG urllib3.connectionpool "GET /api/[uris]=1&after=&order=oldest&limit=1 HTTP/1.1" 200 None
2019-05-15 15:29:43,353 DEBUG urllib3.connectionpool Starting new HTTP connection (1): localhost:5002
2019-05-15 15:29:43,356 DEBUG urllib3.connectionpool http://localhost:5002 "POST /origin/add_multi HTTP/1.1" 200 77
2019-05-15 15:29:43,357 DEBUG urllib3.connectionpool Starting new HTTP connection (1): localhost:5008
2019-05-15 15:29:43,379 DEBUG urllib3.connectionpool http://localhost:5008 "POST /create_tasks HTTP/1.1" 200 312
Traceback (most recent call last):
  File "", line 19, in <module>
  File "/usr/lib/python3/dist-packages/celery/", line 191, in __call__
    return self._get_current_object()(*a, **kw)
  File "/home/antoine/swh/swh-environment/swh-scheduler/swh/scheduler/", line 45, in __call__
    return super().__call__(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/celery/app/", line 375, in __call__
    return*args, **kwargs)
  File "/home/antoine/swh/swh-environment/swh-lister/swh/lister/phabricator/", line 23, in full_phabricator_lister
  File "/home/antoine/swh/swh-environment/swh-lister/swh/lister/phabricator/", line 102, in run
    min_bound = self._bootstrap_repositories_listing()
  File "/home/antoine/swh/swh-environment/swh-lister/swh/lister/phabricator/", line 87, in _bootstrap_repositories_listing
    self.create_missing_origins_and_tasks(models_list, injected)
  File "/home/antoine/swh/swh-environment/swh-lister/swh/lister/core/", line 503, in create_missing_origins_and_tasks
    ir, m, _ = tasks[_task_key(task)]
KeyError: 'origin-update-git-{"kwargs": {}, "args": [""]}'

This is due to the _task_key private function that is not deterministic regarding
the return value. For instance, with the above example, it can either return:

  • 'origin-update-git-{"kwargs": {}, "args": [""]}'
  • 'origin-update-git-{"args": [""], "kwargs": {}}'

The proper fix to that issue is to sort the keys of the JSON document to dump.

Diff Detail

rDLS Listers
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

anlambert created this revision.May 15 2019, 3:42 PM
anlambert edited the summary of this revision. (Show Details)May 15 2019, 3:43 PM
ardumont accepted this revision.May 15 2019, 3:52 PM
This revision is now accepted and ready to land.May 15 2019, 3:52 PM
This revision was automatically updated to reflect the committed changes.