Page MenuHomeSoftware Heritage

Failed ingestion of a GitHub repository
Open, HighPublic

Description

The ingestion of the repository at https://github.com/sosy-lab/sv-benchmarks appears to fail systematically

https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/sosy-lab/sv-benchmarks&timestamp=2020-12-30T21:50:59Z

https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/sosy-lab/sv-benchmarks&timestamp=2021-01-04T17:05:11Z

It is important to understand what is going wrong here (unfortunately the admin console does not show anything useful, apart from "failed"), see image.

Event Timeline

rdicosmo triaged this task as High priority.Mon, Jan 4, 7:29 PM
rdicosmo created this task.

The repository has been correctly ingested on 05 January 2021, 11:56 UTC .

https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/sosy-lab/sv-benchmarks

Previous attempts might have failed due to infrastructure related error (maybe a lost worker as task termination date did not get reported).

Nevertheless, it seems the multiple tasks execution did not get correctly reported in the save code now list. Only one entry for the repository
is available while multiple should be available, I will look into it.

For the record, the load failure on 2021-01-04T17:05:11Z was due to a network error (found via Kibana):

[2021-01-04 17:05:11,752: ERROR/ForkPoolWorker-2545] Loading failure, updating to `partial` status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 159, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 80, in create_connection
    raise err
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 70, in create_connection
    sock.connect(sa)
OSError: [Errno 101] Network is unreachable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 343, in _make_request
    self._validate_conn(conn)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 841, in _validate_conn
    conn.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 301, in connect
    conn = self._new_conn()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 168, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f0077827470>: Failed to establish a new connection: [Errno 101] Network is unreachable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 318, in load
    more_data_to_fetch = self.fetch_data()
  File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 239, in fetch_data
    self.origin.url, self.base_snapshot, do_progress
  File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 174, in fetch_pack_from_origin
    progress=do_activity,
  File "/usr/lib/python3/dist-packages/dulwich/client.py", line 1644, in fetch_pack
    b"git-upload-pack", url)
  File "/usr/lib/python3/dist-packages/dulwich/client.py", line 1532, in _discover_references
    resp, read = self._http_request(url, headers, allow_compression=True)
  File "/usr/lib/python3/dist-packages/dulwich/client.py", line 1501, in _http_request
    resp = self.pool_manager.request("GET", url, headers=req_headers)
  File "/usr/lib/python3/dist-packages/urllib3/request.py", line 68, in request
    **urlopen_kw)
  File "/usr/lib/python3/dist-packages/urllib3/request.py", line 89, in request_encode_url
    return self.urlopen(method, url, **extra_kw)
  File "/usr/lib/python3/dist-packages/urllib3/poolmanager.py", line 323, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 667, in urlopen
    **response_kw)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 667, in urlopen
    **response_kw)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 667, in urlopen
    **response_kw)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /sosy-lab/sv-benchmarks/info/refs?service=git-upload-pack (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f0077827470>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

Thanks Antoine, any way to have this kind of errors also reported in the admin dashboard for save code now.

Thanks Antoine, any way to have this kind of errors also reported in the admin dashboard for save code now.

This should have been reported but it looks like there is an issue when multiple save requests have been created for a same repository.
I need to dig further on this, keeping that task open until then.