While playing with the Phabricator lister, I sometimes encoutered the following error:
$ python3 test_phabricator_lister.py 2019-05-15 15:29:42,959 DEBUG swh.lister.core.lister_base Loading config from lister_phabricator 2019-05-15 15:29:42,960 INFO swh.core.config Loading config file /home/antoine/.config/swh/lister_phabricator.yml 2019-05-15 15:29:42,962 DEBUG swh.lister.core.lister_base <swh.lister.phabricator.lister.PhabricatorLister object at 0x7f747ce698d0> CONFIG={'lister': {'cls': 'local', 'args': {'db': 'postgresql:///lister-phabricator'}}, 'content_size_limit': 104857600, 'log_db': 'dbname=softwareheritage-log', 'cache_responses': False, 'scheduler': {'cls': 'remote', 'args': {'url': 'http://localhost:5008/'}}, 'cache_dir': '/home/antoine/.cache/swh/lister/phabricator', 'storage': {'cls': 'remote', 'args': {'url': 'http://localhost:5002/'}}, 'credentials': []} 2019-05-15 15:29:42,980 DEBUG urllib3.connectionpool Starting new HTTPS connection (1): phabricator.wikimedia.org:443 2019-05-15 15:29:43,325 DEBUG urllib3.connectionpool https://phabricator.wikimedia.org:443 "GET /api/diffusion.repository.search?api.token=api-3hqsvzuf3f7lxvlt33tbl7xcvqnm&order=oldest&attachments[uris]=1&after=&order=oldest&limit=1 HTTP/1.1" 200 None 2019-05-15 15:29:43,353 DEBUG urllib3.connectionpool Starting new HTTP connection (1): localhost:5002 2019-05-15 15:29:43,356 DEBUG urllib3.connectionpool http://localhost:5002 "POST /origin/add_multi HTTP/1.1" 200 77 2019-05-15 15:29:43,357 DEBUG urllib3.connectionpool Starting new HTTP connection (1): localhost:5008 2019-05-15 15:29:43,379 DEBUG urllib3.connectionpool http://localhost:5008 "POST /create_tasks HTTP/1.1" 200 312 Traceback (most recent call last): File "test_phabricator_lister.py", line 19, in <module> api_token='api-3hqsvzuf3f7lxvlt33tbl7xcvqnm') File "/usr/lib/python3/dist-packages/celery/local.py", line 191, in __call__ return self._get_current_object()(*a, **kw) File "/home/antoine/swh/swh-environment/swh-scheduler/swh/scheduler/task.py", line 45, in __call__ return super().__call__(*args, **kwargs) File "/usr/lib/python3/dist-packages/celery/app/task.py", line 375, in __call__ return self.run(*args, **kwargs) File "/home/antoine/swh/swh-environment/swh-lister/swh/lister/phabricator/tasks.py", line 23, in full_phabricator_lister lister.run() File "/home/antoine/swh/swh-environment/swh-lister/swh/lister/phabricator/lister.py", line 102, in run min_bound = self._bootstrap_repositories_listing() File "/home/antoine/swh/swh-environment/swh-lister/swh/lister/phabricator/lister.py", line 87, in _bootstrap_repositories_listing self.create_missing_origins_and_tasks(models_list, injected) File "/home/antoine/swh/swh-environment/swh-lister/swh/lister/core/lister_base.py", line 503, in create_missing_origins_and_tasks ir, m, _ = tasks[_task_key(task)] KeyError: 'origin-update-git-{"kwargs": {}, "args": ["https://phabricator.wikimedia.org/source/mediawiki.git"]}'
This is due to the _task_key private function that is not deterministic regarding
the return value. For instance, with the above example, it can either return:
- 'origin-update-git-{"kwargs": {}, "args": ["https://phabricator.wikimedia.org/source/mediawiki.git"]}'
- 'origin-update-git-{"args": ["https://phabricator.wikimedia.org/source/mediawiki.git"], "kwargs": {}}'
The proper fix to that issue is to sort the keys of the JSON document to dump.