Multiple types of issues are currently reported in our sentry instance since the loading
started. Opened here so they are publicly shareable [1] (not investigated).
I've added [3] which is all the last log of the failing worker which should give the origin in failure plus the actual encoutered issue.
- [[ https://sentry.softwareheritage.org/share/issue/57a8778ba1bc4cf4b9ff4e489f1fa176/ | Problem during unpacking ...mssql-2.1.0.tbz. Reason: Unknown archive format '.../mssql-2.1.0.tbz'... ]]
- [[ https://sentry.softwareheritage.org/share/issue/57a8778ba1bc4cf4b9ff4e489f1fa176/ | Problem during unpacking /tmp/tmp9qvp688z/clustershell-1.3.2-1.fc13.src.rpm. Reason: Unknown archive format '/tmp/tmp9qvp688z/clustershell-1.3.2-1.fc13.src.rpm' ]]
- [[ https://sentry.softwareheritage.org/share/issue/c3e95bf9fea94038805cd5160d7da3a8/ | Fail to query 'http://www-verimag.imag.fr/DIST-TOOLS/SYNCHRONE/pool/rdbg.1.70.tgz'. Reason: 404 ]] [2]
- [[ https://sentry.softwareheritage.org/share/issue/c3e95bf9fea94038805cd5160d7da3a8/ | Fail to query 'http://www.lexifi.com/sedlex/sedlex-1.99.1.tar.gz'. Reason: 404 ]]
- [[ https://sentry.softwareheritage.org/share/issue/beb647ddb82c43a9930ab16ff5278079/ | HTTPSConnectionPool(host='gitlab.inria.fr', port=443): Max retries exceeded with url: /fpottier/inferno/-/archive/20201104/ ]]
- [[ https://sentry.softwareheritage.org/share/issue/008fbab7d228449ea1659a45de48d279/ | HTTPSConnectionPool(host='opam.ocaml.org', port=443): Max retries exceeded with url: /cache/md5/f6/f69ee4f693330 ]]
- [[ https://sentry.softwareheritage.org/share/issue/7dc92745d96442d482a493c64b6eae91/ | InvalidSchema(No connection adapters were found for 'git+https://github.com/ivg/emacs-dot.git') ]]
- [[ https://sentry.softwareheritage.org/share/issue/0cf519830d45466582f35968317aa8c0/ | MissingSchema(Invalid URL '': No schema supplied. Perhaps you meant http://?) ]], see also T3294
- [[ https://sentry.softwareheritage.org/share/issue/e06cfad6d7b84dafabdb6c5f1e2ddb38/ ]] IsADirectoryError([Errno 21] Is a directory: '/tmp/tmp1sq9qtky/')
- [[ https://sentry.softwareheritage.org/share/issue/3a2db2bdcceb4421a29d235685b84e81/ ]] OSError([Errno 36] File name too long: '/tmp/tmpd3fex7_2/weberizer-0.6.2.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA5BA2674WEWV2CIOD%2F20210917%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210917T070403Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=d5a0693b69f150d751c884bf44fc35568f7fb1339b038958b2021873b0d10cb8')
From afar, the 404, we cannot do much about it (P1117#7495 for the origins in question).
We have at least 2 unsupported archive formats "rpm", "tbz" ("tbz2"). Fixing those
sound like the most important. Plus it's beneficial for other package loaders (e.g
archive, cran, pypi, nixguix, ...).
The connection error ones might be worked around adding some retry decorators like those
existing in lister.
[1] kibana is not opened so my dashboard opening was not that helpful...
[2] full extract of all events "so far" in F4628907 (contains more than just opam tasks).
[3] F4628948
[4] http://kibana0.internal.softwareheritage.org:5601/goto/079dbfb481d31f3a86b8f41c3133e884 (staff only though)