Page MenuHomeSoftware Heritage

Investigate timeouts in the deposit loader in Docker
Closed, MigratedEdits Locked

Description

@SupLinux reported the following issue on IRC:

hi swh, I am uploading a large package to my local deposit instance, and it throw an timeout error, could you help extend this value from 60s to 600s?

Traceback (most recent call last):
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/requests/adapters.py", line 450, in send
    timeout=timeout
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 786, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 710, in urlopen
    chunked=chunked,
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 451, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 341, in _raise_timeout
    self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='swh-deposit', port=5006): Read timed out. (read timeout=60)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/loader.py", line 633, in load
    res = self._load_release(p_info, origin)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/loader.py", line 802, in _load_release
    dl_artifacts = self.download_package(p_info, tmpdir)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/deposit/loader.py", line 193, in download_package
    return [self.client.archive_get(self.deposit_id, tmpdir, p_info.filename)]
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/deposit/loader.py", line 341, in archive_get
    return download(url, dest=tmpdir, filename=filename, auth=self.auth)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/utils.py", line 106, in download
    response = requests.get(url, **params, timeout=timeout, stream=True)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/srv/softwareheritage/venv/lib/python3.7/site-packages/requests/adapters.py", line 532, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='swh-deposit', port=5006): Read timed out. (read timeout=60)

It's not clear to me whether this timeout should happen even on large files, as they should be streamed immediately. This might be an issue with the deposit server.

Event Timeline

vlorentz triaged this task as Normal priority.Jan 15 2022, 11:10 AM
vlorentz created this task.

Thanks for help me open this issue, From my use usage scenario , I need to upload some package perhaps greater than 10G to deposit, thus will raise an timeout issue. I used this guide to deploy my environment(https://docs.softwareheritage.org/devel/getting-started.html#getting-started) , and my server configuration is 16core/64Gbram/200Gbdisk size, I hope this timeout value could changed by configuration file, because the upload time is depend on every user deployment environment.

Another problem is swh-deposit client, when I use below command to upload an large archive(16Gb size) , it will consume much memory which is more than 40Gb, this is also a big problem for client, I hope swh could automatically divide large archive:)

swh deposit upload --username test --password test --archive test.tar --author redhat --name redhat-test--url http://localhost:5080/deposit/1

Er yeah, the deposit isn't designed for archives this big. You should probably host your tarballs somewhere and point the archive loader to it, instead.

ok, got it, we still want a patch for my first question, which is make timeout value could be configured