Page MenuHomeSoftware Heritage

swh-lister: Phabricator lister no longer works
Closed, MigratedEdits Locked

Description

TL; DR

Where the data is expected to be json, html is received, so the lister cannot work any longer.

Details

Latest lister code (v0.0.35), latest docker-dev (ed7f9a36155dca1baa9eeb54f092a3f329532106).

Initially, the error was raised on the staging area, after deploying our latest lister version.
(This may be irrelevant to our latest refactorings though)

Error log excerpt (redacted the phabricator credentials entry):

swh-lister_1                  | [2019-09-11 12:44:11,942: DEBUG/ForkPoolWorker-1] Loading config from lister_phabricator
swh-lister_1                  | [2019-09-11 12:44:11,942: INFO/ForkPoolWorker-1] Loading config file /lister.yml
swh-lister_1                  | [2019-09-11 12:44:11,953: DEBUG/ForkPoolWorker-1] <swh.lister.phabricator.lister.PhabricatorLister object at 0x7fd96f3f2c50> CONFIG={'content_size_limit': 104857600, 'log_db': 'dbname=softwareheritage-log', 'storage': {'cls': 'remote', 'args': {'url': 'http://swh-storage:5002/'}}, 'scheduler': {'cls': 'remote', 'args': {'url': 'http://swh-scheduler-api:5008/'}}, 'lister': {'cls': 'local', 'args': {'db': 'postgresql://postgres@swh-listers-db/swh-listers'}}, 'celery': {'task_broker': 'amqp://guest:guest@amqp//', 'task_modules': ['swh.lister.bitbucket.tasks', 'swh.lister.cgit.tasks', 'swh.lister.cran.tasks', 'swh.lister.debian.tasks', 'swh.lister.github.tasks', 'swh.lister.gitlab.tasks', 'swh.lister.gnu.tasks', 'swh.lister.npm.tasks', 'swh.lister.packagist.tasks', 'swh.lister.phabricator.tasks', 'swh.lister.pypi.tasks'], 'task_queues': ['swh.lister.bitbucket.tasks.FullBitBucketRelister', 'swh.lister.bitbucket.tasks.IncrementalBitBucketLister', 'swh.lister.bitbucket.tasks.RangeBitBucketLister', 'swh.lister.bitbucket.tasks.ping', 'swh.lister.cgit.tasks.CGitListerTask', 'swh.lister.cgit.tasks.ping', 'swh.lister.cran.tasks.CRANListerTask', 'swh.lister.cran.tasks.ping', 'swh.lister.debian.tasks.DebianListerTask', 'swh.lister.debian.tasks.ping', 'swh.lister.github.tasks.FullGitHubRelister', 'swh.lister.github.tasks.IncrementalGitHubLister', 'swh.lister.github.tasks.RangeGitHubLister', 'swh.lister.github.tasks.ping', 'swh.lister.gitlab.tasks.FullGitLabRelister', 'swh.lister.gitlab.tasks.IncrementalGitLabLister', 'swh.lister.gitlab.tasks.RangeGitLabLister', 'swh.lister.gitlab.tasks.ping', 'swh.lister.gnu.tasks.GNUListerTask', 'swh.lister.gnu.tasks.ping', 'swh.lister.npm.tasks.NpmIncrementalListerTask', 'swh.lister.npm.tasks.NpmListerTask', 'swh.lister.npm.tasks.ping', 'swh.lister.packagist.tasks.PackagistListerTask', 'swh.lister.packagist.tasks.ping', 'swh.lister.phabricator.tasks.FullPhabricatorLister', 'swh.lister.phabricator.tasks.IncrementalPhabricatorLister', 'swh.lister.phabricator.tasks.ping', 'swh.lister.pypi.tasks.PyPIListerTask', 'swh.lister.pypi.tasks.ping']}, 'cache_responses': False, 'cache_dir': '/srv/softwareheritage/.cache/swh/lister/phabricator'}
swh-lister_1                  | [2019-09-11 12:44:12,214: DEBUG/ForkPoolWorker-1] Error submitting statsd packet. Dropping the packet and closing the socket.
swh-lister_1                  | [2019-09-11 12:44:12,226: DEBUG/ForkPoolWorker-1] Error submitting statsd packet. Dropping the packet and closing the socket.
swh-lister_1                  | [2019-09-11 12:44:12,227: ERROR/ForkPoolWorker-1] Task swh.lister.phabricator.tasks.FullPhabricatorLister[0c4f3cd5-3dfc-433d-a446-c29c5e356e72] raised unexpected: JSONDecodeError('Expecting value: line 1 column 1 (char 0)')
swh-lister_1                  | Traceback (most recent call last):
swh-lister_1                  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 385, in trace_task
swh-lister_1                  |     R = retval = fun(*args, **kwargs)
swh-lister_1                  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 45, in __call__
swh-lister_1                  |     return super().__call__(*args, **kwargs)
swh-lister_1                  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 648, in __protected_call__
swh-lister_1                  |     return self.run(*args, **kwargs)
swh-lister_1                  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/lister/phabricator/tasks.py", line 12, in list_phabricator_full
swh-lister_1                  |     PhabricatorLister(**lister_args).run()
swh-lister_1                  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/lister/core/indexing_lister.py", line 234, in run
swh-lister_1                  |     for i in ingest_indexes():
swh-lister_1                  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/lister/core/indexing_lister.py", line 213, in ingest_indexes
swh-lister_1                  |     index = min_bound or self.default_min_bound
swh-lister_1                  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/lister/phabricator/lister.py", line 35, in default_min_bound
swh-lister_1                  |     return self._bootstrap_repositories_listing()
swh-lister_1                  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/lister/phabricator/lister.py", line 121, in _bootstrap_repositories_listing
swh-lister_1                  |     models_list = self.transport_response_simplified(response)
swh-lister_1                  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/lister/phabricator/lister.py", line 88, in transport_response_simplified
swh-lister_1                  |     repos = response.json()
swh-lister_1                  |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/requests/models.py", line 897, in json
swh-lister_1                  |     return complexjson.loads(self.text, **kwargs)
swh-lister_1                  |   File "/usr/local/lib/python3.7/json/__init__.py", line 348, in loads
swh-lister_1                  |     return _default_decoder.decode(s)
swh-lister_1                  |   File "/usr/local/lib/python3.7/json/decoder.py", line 337, in decode
swh-lister_1                  |     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
swh-lister_1                  |   File "/usr/local/lib/python3.7/json/decoder.py", line 355, in raw_decode
swh-lister_1                  |     raise JSONDecodeError("Expecting value", s, err.value) from None
swh-lister_1                  | json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Indeed (from the staging area), logging the response.text entry, the data is not json as the code expects, it's raw html.

Sep 11 12:37:11 worker1 python3[29944]: [2019-09-11 12:37:11,446: DEBUG/ForkPoolWorker-1] ##### response.raw: <!DOCTYPE html><html><head><meta charset="UTF-8" /><title>Activity</title><meta name="viewport" content="width=device-width, init
ial-scale=1, user-scalable=no" /><link rel="mask-icon" color="#3D4B67" href="https://forge.softwareheritage.org/res/phabricator/db699fe1/rsrc/favicons/mask-icon.svg" /><link rel="apple-touch-icon" sizes="76x76" href="https://forge.software
heritage.org/file/data/zdehrkcp53oozl423ojr/PHID-FILE-kogazo5npeqahqimgprb/favicon" /><link rel="apple-touch-icon" sizes="120x120" href="https://forge.softwareheritage.org/file/data/kv275yqkv2772zxyfphu/PHID-FILE-x42alr2tutfp4wpk3qcl/favic
on" /><link rel="apple-touch-icon" sizes="152x152" href="https://forge.softwareheritage.org/file/data/hpmyxqzkaoyhwosnzooc/PHID-FILE-ea2dkm4nucfwmgfctawo/favicon" /><link rel="icon" id="favicon" href="https://forge.softwareheritage.org/fil
e/data/hncu2zrlv7czuu74rzfb/PHID-FILE-2miz274edt4pr5pc7fxg/favicon" /><meta name="referrer" content="no-referrer" /><link rel="stylesheet" type="text/css" href="https://forge.softwareheritage.org/res/defaultX/phabricator/3dc188c0/core.pkg.
css" /><link rel="stylesheet" type="text/css" href="https://forge.softwareheritage.org/res/defaultX/phabricator/3c8a0668/conpherence.pkg.css" /><script type="text/javascript" src="https://forge.softwareheritage.org/res/defaultX/phabricator
/98e6504a/rsrc/externals/javelin/core/init.js"></script></head><body class="device-desktop phui-theme-red phabricator-home"><div class="main-page-frame" id="main-page-frame"><div id="phabricator-standard-page" class="phabricator-standard-$
age"><div class="phabricator-main-menu phabricator-main-menu-background" id="UQ0_12"><a class=" phabricator-core-user-menu phabricator-core-user-mobile-menu" href="#" role="button" data-sigil="phui-dropdown-menu" data-meta="0_16"><span cl$
ss="aural-only">Page Menu</span><span class="visual-only phui-icon-view phui-font-fa fa-bars" data-meta="0_17" aria-hidden="true"></span><span class="caret"></span></a><a class...

Event Timeline

ardumont triaged this task as Normal priority.Sep 11 2019, 2:53 PM
ardumont created this task.
ardumont updated the task description. (Show Details)

(This may be irrelevant to our latest refactorings though)

Yes, i'm more confident it's not related as spawning a list-gitlab-full for inria's gitlab instance works as expected ;)

[1] https://webapp.internal.staging.swh.network/browse/search/?q=inria&with_visit&with_content

ardumont renamed this task from staging/docker-dev: Phabricator lister no longer works to swh-lister: Phabricator lister no longer works.Sep 12 2019, 10:15 AM
ardumont updated the task description. (Show Details)

So like i replied to an email early on (which did not caught up here), the problem lies with the input.
So no real bug whatsoever!

ardumont claimed this task.