An artifact without time info can be provided in the artifacts list
parameter of the loader.
For instance last modification date is not available for tarballs coming from github tags
(the date header below corresponds to request time, not tarball last modification).
15:09 $ curl -Li https://github.com/chromium/chromium/archive/refs/tags/104.0.5106.1.tar.gz HTTP/2 302 server: GitHub.com date: Tue, 07 Jun 2022 13:10:44 GMT content-type: text/html; charset=utf-8 vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, Accept-Encoding, Accept, X-Requested-With permissions-policy: interest-cohort=() location: https://codeload.github.com/chromium/chromium/tar.gz/refs/tags/104.0.5106.1 cache-control: max-age=0, private strict-transport-security: max-age=31536000; includeSubdomains; preload x-frame-options: deny x-content-type-options: nosniff x-xss-protection: 0 referrer-policy: no-referrer-when-downgrade expect-ct: max-age=2592000, report-uri="https://api.github.com/_private/browser/errors" content-security-policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; child-src github.com/assets-cdn/worker/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com objects-origin.githubusercontent.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com cdn.optimizely.com logx.optimizely.com/v1/events *.actions.githubusercontent.com wss://*.actions.githubusercontent.com online.visualstudio.com/api/v1/locations github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src render.githubusercontent.com viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: github.githubassets.com identicons.github.com github-cloud.s3.amazonaws.com secured-user-images.githubusercontent.com/ github-production-user-asset-6210df.s3.amazonaws.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; worker-src github.com/assets-cdn/worker/ gist.github.com/assets-cdn/worker/ content-length: 0 x-github-request-id: D358:4A4C:9CBB6E:BCAB87:629F4E54 HTTP/2 200 access-control-allow-origin: https://render.githubusercontent.com content-disposition: attachment; filename=chromium-104.0.5106.1.tar.gz content-security-policy: default-src 'none'; style-src 'unsafe-inline'; sandbox content-type: application/x-gzip etag: "2ebec60c73390de10b6e84d75838466d939f03a7b468f10873c9023f549a5242" strict-transport-security: max-age=31536000 vary: Authorization,Accept-Encoding,Origin x-content-type-options: nosniff x-frame-options: deny x-xss-protection: 1; mode=block date: Tue, 07 Jun 2022 13:10:45 GMT x-github-request-id: 867A:7031:7EED7:179E4C:629F4E54 Warning: Binary output can mess up your terminal. Use "--output -" to tell Warning: curl to output it to your terminal anyway, or consider "--output Warning: <FILE>" to save to a file.
That case was not handled by the archive loader wich was resulting
in loading error so add fix for it.
swh-loader_1 | [2022-06-07 10:00:56,998: INFO/MainProcess] Task swh.loader.package.archive.tasks.LoadArchive[d61d54e5-3163-439a-95a5-2ab57bd75a7d] received swh-loader_1 | [2022-06-07 10:00:57,001: DEBUG/ForkPoolWorker-1] Loading config file /loader.yml swh-loader_1 | [2022-06-07 10:00:59,059: DEBUG/ForkPoolWorker-1] last snapshot: None swh-loader_1 | [2022-06-07 10:00:59,064: DEBUG/ForkPoolWorker-1] package_info: ArchivePackageInfo(url='https://github.com/chromium/chromium/archive/refs/tags/104.0.5106.1.tar.gz', filename='104.0.5106.1.tar.gz', version='104.0.5106.1', directory_extrinsic_metadata=[], raw_info={'url': 'https://github.com/chromium/chromium/archive/refs/tags/104.0.5106.1.tar.gz', 'time': None, 'length': None, 'version': '104.0.5106.1'}, length=None, time=None) swh-loader_1 | [2022-06-07 10:01:00,790: DEBUG/ForkPoolWorker-1] filename: 104.0.5106.1.tar.gz swh-loader_1 | [2022-06-07 10:01:00,791: DEBUG/ForkPoolWorker-1] filepath: /tmp/tmpgnd1w9fy/104.0.5106.1.tar.gz swh-loader_1 | [2022-06-07 10:08:40,664: DEBUG/ForkPoolWorker-1] extrinsic_metadata swh-loader_1 | [2022-06-07 10:10:02,826: DEBUG/ForkPoolWorker-1] uncompressed_path: /tmp/tmpgnd1w9fy/src swh-loader_1 | [2022-06-07 10:11:38,076: DEBUG/ForkPoolWorker-1] Number of skipped contents: 0 swh-loader_1 | [2022-06-07 10:11:38,076: DEBUG/ForkPoolWorker-1] Number of contents: 367501 swh-loader_1 | [2022-06-07 10:11:38,558: DEBUG/ForkPoolWorker-1] Flushing 367501 objects of type content (3423607967 bytes) swh-loader_1 | [2022-06-07 10:32:41,504: DEBUG/ForkPoolWorker-1] Number of directories: 34530 swh-loader_1 | [2022-06-07 10:32:41,542: DEBUG/ForkPoolWorker-1] Flushing 34530 objects of type directory (432087 entries) swh-loader_1 | [2022-06-07 10:33:20,750: ERROR/ForkPoolWorker-1] Failed to load branch releases/104.0.5106.1 for https://github.com/chromium/chromium/tags swh-loader_1 | Traceback (most recent call last): swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/loader.py", line 648, in load swh-loader_1 | res = self._load_release(p_info, origin) swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/loader.py", line 826, in _load_release swh-loader_1 | p_info, uncompressed_path, directory=directory.hash swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/archive/loader.py", line 148, in build_release swh-loader_1 | normalized_time = TimestampWithTimezone.from_datetime(parsed_time) swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/model/model.py", line 488, in from_datetime swh-loader_1 | return cls.from_dict(dt) swh-loader_1 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/model/model.py", line 482, in from_dict swh-loader_1 | f"TimestampWithTimezone.from_dict received non-integer timestamp: " swh-loader_1 | ValueError: TimestampWithTimezone.from_dict received non-integer timestamp: None swh-loader_1 | [2022-06-07 10:33:20,752: DEBUG/ForkPoolWorker-1] default version: 104.0.5106.1 swh-loader_1 | [2022-06-07 10:33:20,755: DEBUG/ForkPoolWorker-1] extra branches: {} swh-loader_1 | [2022-06-07 10:33:20,755: DEBUG/ForkPoolWorker-1] releases: {'104.0.5106.1': []} swh-loader_1 | [2022-06-07 10:33:20,755: DEBUG/ForkPoolWorker-1] snapshot: {'branches': {}} swh-loader_1 | [2022-06-07 10:33:20,755: DEBUG/ForkPoolWorker-1] snapshot: Snapshot(branches=ImmutableDict({}), id=hash_to_bytes('1a8893e6a86f444e8be8e7bda6cb34fb1735a00e')) swh-loader_1 | [2022-06-07 10:33:20,755: DEBUG/ForkPoolWorker-1] Flushing 1 objects of type snapshot swh-loader_1 | [2022-06-07 10:33:22,355: WARNING/ForkPoolWorker-1] 1 failed branches swh-loader_1 | [2022-06-07 10:33:22,356: WARNING/ForkPoolWorker-1] Failed branches: releases/104.0.5106.1