Only 1 worker is currently running.
I expect only 1 query in the backend at a time with such setup.
That's not what's currently seen in the pg_activity -p 5434 (softwareheritage-indexer).
In the mean time, the current worker shows the following stacktrace [1].
So my take on this is that the query (using index scans as designed) works on a range too large for the query to finish.
What's not expected though is that the worker part explodes like [1] but the query in the backend (indexer-storage's db) happily continues querying.
Thus the load on somerset happily grows...
Maybe the following plan would be acceptable:
- adding some @timeout on the indexer-storage's storage api (as we do in the swh-storage's)
- and rework the ranges defined in the scheduler for the fossology-license indexer (IMSMW, 100k range tasks were created, we should reduce those ranges' size, thus increasing the number of tasks)
[1]
Jun 07 06:17:53 worker08 python3[123583]: [2019-06-07 06:17:53,918: INFO/MainProcess] Received task: swh.indexer.tasks.ContentRangeFossologyLicense[452abd0b-8db8-465c-9a2d-eb84d3ed90e5] Jun 07 07:17:57 worker08 python3[59331]: [2019-06-07 07:17:57,176: ERROR/ForkPoolWorker-3] Problem when computing metadata. Traceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/indexer/indexer.py", line 516, in run n=self.config['write_batch_size']): File "/usr/lib/python3/dist-packages/swh/core/utils.py", line 48, in grouper for _data in itertools.zip_longest(*args, fillvalue=stop_value): File "/usr/lib/python3/dist-packages/swh/indexer/indexer.py", line 479, in _index_with_skipping_already_done indexed_page = self.indexed_contents_in_range(start, end) File "/usr/lib/python3/dist-packages/swh/indexer/fossology_license.py", line 172, in indexed_contents_in_range start, end, self.tool['id']) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 133, in meth_ return self.post(meth._endpoint_path, post_data) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 198, in post return self._decode_response(response) File "/usr/lib/python3/dist-packages/swh/core/api/__init__.py", line 235, in _decode_response response.content, swh.core.api.RemoteException: Unexpected status code for API request: 504 (b'<html>\r\n<head><title>504 Gateway Time-out</title></head>\r\n<body bgcolor="white">\r\n<center><h1>504 Gateway Time-out</h1></center>\r\n<hr><center>nginx/1.10.3</center>\r\n</body>\r\n</html>\r\n')