Page MenuHomeSoftware Heritage

Fix longstanding deposit status verified which is usually a symptom of loading failure
Closed, MigratedEdits Locked

Description

This is staging but nonetheless, that could happen in production.

Currently this error occured during ingestion of a deposit:

Feb 24 20:36:42 worker0 python3[657402]: [2021-02-24 20:36:42,387: INFO/MainProcess] Received task: swh.loader.package.deposit.tasks.LoadDeposit[e3d3bc13-4ce4-4d5d-b6de-aa41f561cba3]
Feb 24 20:36:43 worker0 python3[657407]: [2021-02-24 20:36:43,888: ERROR/ForkPoolWorker-1] Failed loading branch HEAD for https://doi.org/10.5281/6a78d227-ae11-4b78-be69-100ba7faf725
                                         Traceback (most recent call last):
                                           File "/usr/lib/python3/dist-packages/swh/loader/package/loader.py", line 426, in load
                                             res = self._load_revision(p_info, origin)
                                           File "/usr/lib/python3/dist-packages/swh/loader/package/loader.py", line 541, in _load_revision
                                             dl_artifacts = self.download_package(p_info, tmpdir)
                                           File "/usr/lib/python3/dist-packages/swh/loader/package/deposit/loader.py", line 186, in download_package
                                             return [self.client.archive_get(self.deposit_id, tmpdir, p_info.filename)]
                                           File "/usr/lib/python3/dist-packages/swh/loader/package/deposit/loader.py", line 336, in archive_get
                                             return download(url, dest=tmpdir, filename=filename, auth=self.auth)
                                           File "/usr/lib/python3/dist-packages/swh/loader/package/utils.py", line 81, in download
                                             raise ValueError("Fail to query '%s'. Reason: %s" % (url, response.status_code))
                                         ValueError: Fail to query 'https://deposit-rp.internal.staging.swh.network/1/private/98/raw/'. Reason: 500
Feb 24 20:36:45 worker0 python3[657407]: [2021-02-24 20:36:45,179: WARNING/ForkPoolWorker-1] 1 failed branches
Feb 24 20:36:45 worker0 python3[657407]: [2021-02-24 20:36:45,180: WARNING/ForkPoolWorker-1] Failed branches: HEAD

[1] https://sentry.softwareheritage.org/share/issue/d7f7880e5d874498b7ff147cbca9d4c4/

Deposit information:

swh-deposit=> \conninfo
You are connected to database "swh-deposit" as user "guest" on host "db1.internal.staging.swh.network" (address "192.168.130.11") at port "5432".
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
swh-deposit=> select * from deposit where id = 98;
-[ RECORD 1 ]--+-------------------------------------------------------------
id             | 98
reception_date | 2021-02-24 20:35:25.262905+00
complete_date  | 2021-02-24 20:36:11.063726+00
external_id    |
swhid          |
status         | verified
client_id      | 9
collection_id  | 8
parent_id      |
status_detail  |
swhid_context  |
check_task_id  | 18736121
load_task_id   | 18736136
origin_url     | https://doi.org/10.5281/6a78d227-ae11-4b78-be69-100ba7faf725

Associated task is marked as completed:

swh-scheduler=>  select * from task where id=18736136;
-[ RECORD 1 ]----+------------------------------------------------------------------------------------------------------------------
id               | 18736136
type             | load-deposit
arguments        | {"args": [], "kwargs": {"url": "https://doi.org/10.5281/6a78d227-ae11-4b78-be69-100ba7faf725", "deposit_id": 98}}
next_run         | 2021-02-24 20:36:30.43889+00
current_interval | 1 day
status           | completed
policy           | oneshot
retries_left     | 3
priority         |

There are 2 problems:

  • the deposit should be marked as "failed" state
  • the actual loading issue

Event Timeline

ardumont triaged this task as Normal priority.Feb 25 2021, 10:31 AM
ardumont created this task.
ardumont added a project: SWORD deposit.
ardumont updated the task description. (Show Details)

the actual loading issue

The server is not able to deal with the zip file "ZIP does not support timestamps before 1980" [1]
Thus the 500 reports to the loader.

[1] https://sentry.softwareheritage.org/share/issue/e7a5997eb34b40b281900f7602f7639a/

ardumont changed the task status from Open to Work in Progress.Feb 25 2021, 3:29 PM
ardumont moved this task from Backlog to In progress on the SWORD deposit board.

Status, fixes landed.

Deployment *was* in progress. Requires both the loader.core 0.19 and deposit 0.10 to be deployed.

I'm currently trying to figure out why deposit debian build fails on buster [1]

[1] T3072

Fixes and adaptations deployed.

So now, when a loader deposit actually fails in the same circumstance as in this task, it should be referenced as failed in the deposit backend (and no longer stays indefinitely in "verified" state).

Internally, the deposit no longer zips data from the deposit client.
It archives them as one tarball and sends it to the deposit loader.

ardumont claimed this task.