Description

Multiple types of issues are currently reported in our sentry instance since the loading
started. Opened here so they are publicly shareable [1] (not investigated).

I've added [3] which is all the last log of the failing worker which should give the origin in failure plus the actual encoutered issue.

Problem during unpacking ...mssql-2.1.0.tbz. Reason: Unknown archive format '.../mssql-2.1.0.tbz'...

Problem during unpacking /tmp/tmp9qvp688z/clustershell-1.3.2-1.fc13.src.rpm. Reason: Unknown archive format '/tmp/tmp9qvp688z/clustershell-1.3.2-1.fc13.src.rpm'

Fail to query 'http://www-verimag.imag.fr/DIST-TOOLS/SYNCHRONE/pool/rdbg.1.70.tgz'. Reason: 404 [2]

Fail to query 'http://www.lexifi.com/sedlex/sedlex-1.99.1.tar.gz'. Reason: 404

HTTPSConnectionPool(host='gitlab.inria.fr', port=443): Max retries exceeded with url: /fpottier/inferno/-/archive/20201104/

HTTPSConnectionPool(host='opam.ocaml.org', port=443): Max retries exceeded with url: /cache/md5/f6/f69ee4f693330

InvalidSchema(No connection adapters were found for 'git+https://github.com/ivg/emacs-dot.git')

MissingSchema(Invalid URL '': No schema supplied. Perhaps you meant http://?), see also T3294

https://sentry.softwareheritage.org/share/issue/e06cfad6d7b84dafabdb6c5f1e2ddb38/ IsADirectoryError([Errno 21] Is a directory: '/tmp/tmp1sq9qtky/')

https://sentry.softwareheritage.org/share/issue/3a2db2bdcceb4421a29d235685b84e81/ OSError([Errno 36] File name too long: '/tmp/tmpd3fex7_2/weberizer-0.6.2.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA5BA2674WEWV2CIOD%2F20210917%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210917T070403Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=d5a0693b69f150d751c884bf44fc35568f7fb1339b038958b2021873b0d10cb8')

Problem during unpacking /tmp/tmpnof815ac/ec34f9a8d1ee28130bed89ea486cf168 Reason: Unknown archive format '/tmp/tmpnof815ac/ec34f9a8d1ee28130bed89ea486cf168'

From afar, the 404, we cannot do much about it (P1117#7495 for the origins in question).

We have at least 2 unsupported archive formats "rpm", "tbz" ("tbz2"). Fixing those
sound like the most important. Plus it's beneficial for other package loaders (e.g
archive, cran, pypi, nixguix, ...).

The connection error ones might be worked around adding some retry decorators like those
existing in lister.

[1] kibana is not opened so my dashboard opening was not that helpful...

[2] full extract of all events "so far" in F4628907 (contains more than just opam tasks).

[3] F4628948

[4] http://kibana0.internal.softwareheritage.org:5601/goto/079dbfb481d31f3a86b8f41c3133e884 (staff only though)

		Status	Assigned	Task
		Migrated	gitlab-migration	T3424 Opam support
		Migrated	gitlab-migration	T3468 staging: current opam loading issues

Event Timeline

ardumont triaged this task as Normal priority.Aug 5 2021, 6:39 PM

ardumont created this task.

ardumont updated the task description. (Show Details)

ardumont mentioned this in T3424: Opam support.Aug 5 2021, 6:44 PM

ardumont updated the task description. (Show Details)Aug 6 2021, 11:56 AM

ardumont updated the task description. (Show Details)Aug 6 2021, 11:59 AM

ardumont updated the task description. (Show Details)Aug 6 2021, 4:58 PM

ardumont updated the task description. (Show Details)

ardumont mentioned this in rDSNIPc0a25dff5e2b: kibana_fetch_logs: Update and make it work again.Aug 6 2021, 5:20 PM

ardumont updated the task description. (Show Details)Aug 6 2021, 5:27 PM

ardumont updated the task description. (Show Details)Aug 6 2021, 5:32 PM

ardumont added a revision: D6135: tarball: Add support for tbz and tbz2 tarball formats.Aug 25 2021, 5:55 PM

ardumont updated the task description. (Show Details)Aug 25 2021, 6:09 PM

ardumont added a commit: rDCOREcc26836d61c5: tarball: Add support for tbz and tbz2 tarball formats.Aug 25 2021, 6:24 PM

For the rpm support, [1] may help.

[1] https://pypi.org/project/rpmfile/

I'll trigger a new run of loading the opam origins in staging so the dataset of issues is updated.

P1158 and P1159 with some updated errors from the last run.

ardumont added a revision: D6250: wip: Allow download to follow redirection to fetch more tarballs.Sep 13 2021, 6:18 PM

In T3468#70305, @ardumont wrote:

P1158 and P1159 with some updated errors from the last run.

I'll udpate those tomorrow as it's still ongoing.

P1158 and P1159 with some updated errors from the last run.

I'll udpate those tomorrow as it's still ongoing.

The ingestion is done (queue for opam tasks in staging scheduler is empty).

I updated the pastes with 404 [1] and unpacking [2] errors.
(so it references all errors from that last run)

[1] P1158

[2] P1159

ardumont updated the task description. (Show Details)Sep 14 2021, 10:50 AM

ardumont added a revision: D6252: package/utils: Improve downloaded filename extraction.Sep 14 2021, 1:59 PM

anlambert added a commit: rDLDBASE049c41f6144d: package/utils: Use download response URL to extract filename.Sep 16 2021, 10:48 AM

anlambert added a commit: rDLDBASE1bd1827fa274: package/utils: Try to extract download filename from response headers.

@anlambert fixed plenty of issues including some from this ticket (thanks a bunch).

I'm planning on deploying those changes soon (around noon, i'm on something else currently).

And i'll trigger another run on staging after that ;)

@anlambert fixed plenty of issues including some from this ticket (thanks a bunch).

I'm planning on deploying those changes soon (around noon, i'm on something else currently).

And i'll trigger another run on staging after that ;)

done.

Heads up, it seems the main issues mentioned above have subsided.
It's still ongoing but the tendency seem to go the right way.

I'm looking at the kibana dashboard again [1]

[1] http://kibana0.internal.softwareheritage.org:5601/goto/82def232f7c606a05e2b451066a948e3

vlorentz updated the task description. (Show Details)Sep 17 2021, 2:55 PM

vlorentz updated the task description. (Show Details)

For the rpm support, [1] may help.

In the end, @anlambert made me notice that this error is not from the opam loader.
Our sentry instance aggregates issues per package loader. As they are all part of
swh-loader-core, they are seen as one.
So there is no need for this in the end as antoine fixed it for the pypi loader (by dismissing those iirc).

ardumont added a revision: D6292: Ensure that filename fallback out of an url is properly sanitized.Sep 17 2021, 4:10 PM

ardumont added a commit: rDLDBASEfabd950d3657: Ensure that filename fallback out of an url is properly sanitized.Sep 17 2021, 4:16 PM

anlambert updated the task description. (Show Details)Sep 17 2021, 5:28 PM

anlambert added a revision: D6301: tarball: Try to guess archive format from mime type as last resort.Sep 17 2021, 5:29 PM

anlambert added a commit: rDCOREe268896d7652: tarball: Try to guess archive format from mimetype as last resort.Sep 17 2021, 6:14 PM

Deployed the multiple fixes we did with @anlambert.
That and @mclovin's upstream fixes. We should be good.

Closing this now.

ardumont changed the task status from Open to Work in Progress.Sep 20 2021, 2:44 PM

ardumont closed this task as Resolved.

ardumont claimed this task.

ardumont added a project: System administration.

ardumont moved this task from Backlog to Weekly backlog on the System administration board.

ardumont moved this task from Weekly backlog to in-progress on the System administration board.

ardumont moved this task from in-progress to code-review/await-feedback/pause on the System administration board.

ardumont moved this task from code-review/await-feedback/pause to deployed/landed/monitoring on the System administration board.

ardumont moved this task from deployed/landed/monitoring to done on the System administration board.

This task has been migrated to GitLab.

rDLDBASE Generic VCS/Package Loader
	Abandoned		D6250 wip: Allow download to follow redirection to fetch more tarballs
		D6292	rDLDBASEfabd950d3657 Ensure that filename fallback out of an url is properly sanitized
		D6252	rDLDBASE1bd1827fa274 package/utils: Try to extract download filename from response headers
		D6252	rDLDBASE049c41f6144d package/utils: Use download response URL to extract filename
rDCORE Foundations and core functionalities
		D6301	rDCOREe268896d7652 tarball: Try to guess archive format from mimetype as last resort
		D6135	rDCOREcc26836d61c5 tarball: Add support for tbz and tbz2 tarball formats

staging: current opam loading issues
Closed, MigratedEdits Locked
Actions

Description

Revisions and Commits

Related Objects
Search...

Event Timeline

staging: current opam loading issuesClosed, MigratedEdits LockedActions

Description

Revisions and Commits

Related ObjectsSearch...

Event Timeline

staging: current opam loading issues
Closed, MigratedEdits Locked
Actions

Related Objects
Search...