With all the replayers stopped and no particular load on the cluster, the loading of almost all the git repository are failing in timeout.
Not tested with other types of origin
Reference in current production:
swhworker@worker01:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_git.yml swh loader run git https://github.com/slackhq/nebula INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/slackhq/nebula' with type 'git' Enumerating objects: 1461, done. Counting objects: 100% (307/307), done. Compressing objects: 100% (186/186), done. Total 1461 (delta 166), reused 193 (delta 109), pack-reused 1154 INFO:swh.loader.git.loader.GitLoader:Listed 293 refs for repo https://github.com/slackhq/nebula {'status': 'eventful'} real 0m9.096s user 0m1.551s sys 0m0.064s
With the cassandra backend:
swh@6490bac3ba28:/$ time swh loader run git https://github.com/slackhq/nebula INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/slackhq/nebula' with type 'git' Enumerating objects: 3317, done. Counting objects: 100% (1128/1128), done. Compressing objects: 100% (508/508), done. Total 3317 (delta 696), reused 915 (delta 601), pack-reused 2189 INFO:swh.loader.git.loader.GitLoader:Listed 293 refs for repo https://github.com/slackhq/nebula ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` status Traceback (most recent call last): File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 339, in load self.store_data() File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 467, in store_data self.flush() File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 168, in flush self.storage.flush() File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/buffer.py", line 156, in flush stats = add_fn(list(batch)) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 59, in revision_add missing_ids = self._filter_missing_ids("revision", (r.id for r in revisions)) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 116, in _filter_missing_ids return set(fn(missing_ids)) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 181, in meth_ return self.post(meth._endpoint_path, post_data) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 278, in post return self._decode_response(response) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 354, in _decode_response self.raise_for_status(response) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/api/client.py", line 29, in raise_for_status super().raise_for_status(response) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 344, in raise_for_status raise exception from None swh.core.api.RemoteException: <RemoteException 500 ReadTimeout: ['Error from server: code=1200 [Coordinator node timed out waiting for replica nodes\' responses] message="Operation timed out - received only 0 responses." info={\'consistency\': \'LOCAL_ONE\', \'required_responses\': 1, \'received_responses\': 0}']> Traceback (most recent call last): File "/srv/softwareheritage/venv/bin/swh", line 8, in <module> sys.exit(main()) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/cli/__init__.py", line 185, in main return swh(auto_envvar_prefix="SWH") File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/cli.py", line 105, in run result = loader.load() File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 382, in load self.flush() File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 168, in flush self.storage.flush() File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/buffer.py", line 156, in flush stats = add_fn(list(batch)) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 59, in revision_add missing_ids = self._filter_missing_ids("revision", (r.id for r in revisions)) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 116, in _filter_missing_ids return set(fn(missing_ids)) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 181, in meth_ return self.post(meth._endpoint_path, post_data) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 278, in post return self._decode_response(response) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 354, in _decode_response self.raise_for_status(response) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/api/client.py", line 29, in raise_for_status super().raise_for_status(response) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 344, in raise_for_status raise exception from None swh.core.api.RemoteException: <RemoteException 500 ReadTimeout: ['Error from server: code=1200 [Coordinator node timed out waiting for replica nodes\' responses] message="Operation timed out - received only 0 responses." info={\'consistency\': \'LOCAL_ONE\', \'required_responses\': 1, \'received_responses\': 0}']> real 8m59.338s <------------ zzzZZZZZ user 0m5.499s sys 0m0.179s
Storage logs:
swh-storage_1 | [2021-08-19 13:56:01 +0000] [31] [DEBUG] POST /origin/add_multi swh-storage_1 | [2021-08-19 13:56:01 +0000] [31] [DEBUG] POST /origin/visit/add swh-storage_1 | [2021-08-19 13:56:01 +0000] [31] [DEBUG] POST /origin/visit/get_latest swh-storage_1 | [2021-08-19 13:56:01 +0000] [31] [DEBUG] POST /origin/visit_status/get_latest swh-storage_1 | [2021-08-19 13:56:01 +0000] [31] [DEBUG] POST /snapshot/get_branches swh-storage_1 | [2021-08-19 13:56:05 +0000] [32] [DEBUG] POST /content/missing swh-storage_1 | [2021-08-19 13:56:56 +0000] [32] [DEBUG] POST /content/add swh-storage_1 | [2021-08-19 13:57:40 +0000] [32] [DEBUG] POST /directory/missing swh-storage_1 | [2021-08-19 13:57:41 +0000] [32] [DEBUG] POST /directory/add swh-storage_1 | [2021-08-19 13:59:12 +0000] [32] [DEBUG] POST /revision/missing swh-storage_1 | ERROR:root:Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0} swh-storage_1 | Traceback (most recent call last): swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request swh-storage_1 | rv = self.dispatch_request() swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request swh-storage_1 | return self.view_functions[rule.endpoint](**req.view_args) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 153, in newf swh-storage_1 | return f.negotiator(*args, **kwargs) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 81, in __call__ swh-storage_1 | result = self.func(*args, **kwargs) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 460, in _f swh-storage_1 | return obj_meth(**kw) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/storage.py", line 593, in revision_missing swh-storage_1 | return self._cql_runner.revision_missing(revisions) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 145, in newf swh-storage_1 | self, *args, **kwargs, statement=self._prepared_statements[f.__name__] swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 542, in revision_missing swh-storage_1 | return self._missing(statement, ids) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 305, in _missing swh-storage_1 | rows = self._execute_with_retries(statement, [ids]) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 324, in wrapped_f swh-storage_1 | return self(f, *args, **kw) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 404, in __call__ swh-storage_1 | do = self.iter(retry_state=retry_state) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 349, in iter swh-storage_1 | return fut.result() swh-storage_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result swh-storage_1 | return self.__get_result() swh-storage_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result swh-storage_1 | raise self._exception swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 407, in __call__ swh-storage_1 | result = fn(*args, **kwargs) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 272, in _execute_with_retries swh-storage_1 | return self._session.execute(statement, args, timeout=1000.0) swh-storage_1 | File "cassandra/cluster.py", line 2618, in cassandra.cluster.Session.execute swh-storage_1 | File "cassandra/cluster.py", line 4894, in cassandra.cluster.ResponseFuture.result swh-storage_1 | cassandra.ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0} swh-storage_1 | [2021-08-19 13:59:17 +0000] [32] [DEBUG] POST /origin/visit_status/add swh-storage_1 | [2021-08-19 13:59:17 +0000] [32] [DEBUG] POST /content/missing swh-storage_1 | [2021-08-19 14:04:53 +0000] [32] [DEBUG] POST /content/add swh-storage_1 | [2021-08-19 14:04:53 +0000] [32] [DEBUG] POST /directory/missing swh-storage_1 | [2021-08-19 14:04:53 +0000] [32] [DEBUG] POST /directory/add swh-storage_1 | [2021-08-19 14:04:53 +0000] [32] [DEBUG] POST /revision/missing swh-storage_1 | ERROR:root:Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0} swh-storage_1 | Traceback (most recent call last): swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request swh-storage_1 | rv = self.dispatch_request() swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request swh-storage_1 | return self.view_functions[rule.endpoint](**req.view_args) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 153, in newf swh-storage_1 | return f.negotiator(*args, **kwargs) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 81, in __call__ swh-storage_1 | result = self.func(*args, **kwargs) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 460, in _f swh-storage_1 | return obj_meth(**kw) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/storage.py", line 593, in revision_missing swh-storage_1 | return self._cql_runner.revision_missing(revisions) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 145, in newf swh-storage_1 | self, *args, **kwargs, statement=self._prepared_statements[f.__name__] swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 542, in revision_missing swh-storage_1 | return self._missing(statement, ids) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 305, in _missing swh-storage_1 | rows = self._execute_with_retries(statement, [ids]) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 324, in wrapped_f swh-storage_1 | return self(f, *args, **kw) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 404, in __call__ swh-storage_1 | do = self.iter(retry_state=retry_state) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 349, in iter swh-storage_1 | return fut.result() swh-storage_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result swh-storage_1 | return self.__get_result() swh-storage_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result swh-storage_1 | raise self._exception swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/tenacity/__init__.py", line 407, in __call__ swh-storage_1 | result = fn(*args, **kwargs) swh-storage_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/cassandra/cql.py", line 272, in _execute_with_retries swh-storage_1 | return self._session.execute(statement, args, timeout=1000.0) swh-storage_1 | File "cassandra/cluster.py", line 2618, in cassandra.cluster.Session.execute swh-storage_1 | File "cassandra/cluster.py", line 4894, in cassandra.cluster.ResponseFuture.result swh-storage_1 | cassandra.ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0} swh-storage_1 | [2021-08-19 14:04:58 +0000] [32] [DEBUG] Closing connection.
Some queries are logged in the cassandra logs:
INFO [ScheduledTasks:1] 2021-08-19 16:00:16,534 NoSpamLogger.java:92 - Some operations were slow, details available at debug level (debug.log) DEBUG [ScheduledTasks:1] 2021-08-19 16:00:16,535 MonitoringTask.java:175 - 1 operations were slow in the last 4999 msecs: <SELECT ctime, length, status FROM swh.content WHERE token(sha1, sha1_git, sha256, blake2s256) >= 1987356847901115199 AND token(sha1, sha1_git, sha256, blake2s256) <= 1987356847901115199 LIMIT 5000>, time 1067 msec - slow timeout 1000 msec/cross-node DEBUG [ScheduledTasks:1] 2021-08-19 16:02:11,537 MonitoringTask.java:175 - 1 operations were slow in the last 5000 msecs: <SELECT ctime, length, status FROM swh.content WHERE token(sha1, sha1_git, sha256, blake2s256) >= 8339318752667056020 AND token(sha1, sha1_git, sha256, blake2s256) <= 8339318752667056020 LIMIT 5000>, time 1114 msec - slow timeout 1000 msec DEBUG [ScheduledTasks:1] 2021-08-19 16:02:14,058 SSLFactory.java:354 - Checking whether certificates have been updated []