Page MenuHomeSoftware Heritage

[docker-compose] gunicorn: Use same timeout value as in production
ClosedPublic

Authored by anlambert on Mar 7 2019, 4:43 PM.

Details

Summary

When trying to load a git repository through the docker-compose environment, I got the following error:

swh-loader_1                  | Total 78340 (delta 94), reused 80 (delta 43), pack-reused 78162
swh-loader_1                  | [2019-03-07 14:48:56,405: INFO/ForkPoolWorker-1] Listed 3099 refs for repo https://github.com/webpack/webpack
swh-loader_1                  | [2019-03-07 14:49:20,678: DEBUG/ForkPoolWorker-1] Sending 9251 contents
swh-loader_1                  | [2019-03-07 14:49:30,937: DEBUG/ForkPoolWorker-1] Done sending 9251 contents
swh-loader_1                  | [2019-03-07 14:49:33,253: DEBUG/ForkPoolWorker-1] Sending 5920 contents
swh-loader_1                  | [2019-03-07 14:49:42,322: DEBUG/ForkPoolWorker-1] Done sending 5920 contents
swh-loader_1                  | [2019-03-07 14:49:44,763: DEBUG/ForkPoolWorker-1] Sending 6440 contents
swh-loader_1                  | [2019-03-07 14:49:53,880: DEBUG/ForkPoolWorker-1] Done sending 6440 contents
swh-loader_1                  | [2019-03-07 14:49:57,034: DEBUG/ForkPoolWorker-1] Sending 7827 contents
swh-loader_1                  | [2019-03-07 14:50:03,465: DEBUG/ForkPoolWorker-1] Done sending 7827 contents
swh-loader_1                  | [2019-03-07 14:50:14,096: DEBUG/ForkPoolWorker-1] Sending 25000 directories
swh-loader_1                  | [2019-03-07 14:50:45,991: ERROR/ForkPoolWorker-1] Loading failure, updating to `partial` status
swh-loader_1                  | Traceback (most recent call last):
swh-loader_1                  |   File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
swh-loader_1                  |     chunked=chunked)
swh-loader_1                  |   File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 384, in _make_request
swh-loader_1                  |     six.raise_from(e, None)
swh-loader_1                  |   File "<string>", line 2, in raise_from
swh-loader_1                  |   File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 380, in _make_request
swh-loader_1                  |     httplib_response = conn.getresponse()
swh-loader_1                  |   File "/usr/local/lib/python3.6/http/client.py", line 1331, in getresponse
swh-loader_1                  |     response.begin()
swh-loader_1                  |   File "/usr/local/lib/python3.6/http/client.py", line 297, in begin
swh-loader_1                  |     version, status, reason = self._read_status()
swh-loader_1                  |   File "/usr/local/lib/python3.6/http/client.py", line 266, in _read_status
swh-loader_1                  |     raise RemoteDisconnected("Remote end closed connection without"
swh-loader_1                  | http.client.RemoteDisconnected: Remote end closed connection without response
swh-loader_1                  |
swh-loader_1                  | During handling of the above exception, another exception occurred:
swh-loader_1                  |
swh-loader_1                  | Traceback (most recent call last):
swh-loader_1                  |   File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
swh-loader_1                  |     timeout=timeout
swh-loader_1                  |   File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
swh-loader_1                  |     _stacktrace=sys.exc_info()[2])
swh-loader_1                  |   File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 367, in increment
swh-loader_1                  |     raise six.reraise(type(error), error, _stacktrace)
swh-loader_1                  |   File "/usr/local/lib/python3.6/site-packages/urllib3/packages/six.py", line 685, in reraise
swh-loader_1                  |     raise value.with_traceback(tb)
swh-loader_1                  |   File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
swh-loader_1                  |     chunked=chunked)
swh-loader_1                  |   File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 384, in _make_request
swh-loader_1                  |     six.raise_from(e, None)
swh-loader_1                  |   File "<string>", line 2, in raise_from
swh-loader_1                  |   File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 380, in _make_request
swh-loader_1                  |     httplib_response = conn.getresponse()
swh-loader_1                  |   File "/usr/local/lib/python3.6/http/client.py", line 1331, in getresponse
swh-loader_1                  |     response.begin()
swh-loader_1                  |   File "/usr/local/lib/python3.6/http/client.py", line 297, in begin
swh-loader_1                  |     version, status, reason = self._read_status()
swh-loader_1                  |   File "/usr/local/lib/python3.6/http/client.py", line 266, in _read_status
swh-loader_1                  |     raise RemoteDisconnected("Remote end closed connection without"
swh-loader_1                  | urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

Then looking at storage rpc server logs, we can see the following:

swh-storage_1                 | [2019-03-07 14:50:14 +0000] [180] [DEBUG] POST /directory/add
swh-storage_1                 | [2019-03-07 14:50:44 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:180)
swh-storage_1                 | [2019-03-07 14:50:45 +0000] [213] [INFO] Booting worker with pid: 213
swh-storage_1                 | /usr/local/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
swh-storage_1                 |   """)

As there is no explicit timeout value set for gunicorn workers, the default one (30)
is used instead and it is not high enough to properly handle some incoming requests.
So set the same timeout value for the gunicorn workers as in production
(https://forge.softwareheritage.org/source/puppet-swh-site/browse/production/data/defaults.yaml?grep=http_timeout)

Diff Detail

Repository
rCDFD Dockerfiles for developers
Branch
fix-storage-gunicorn-timeout
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 4557
Build 6046: arc lint + arc unit