Page MenuHomeSoftware Heritage

Save process seems to be stuck
Closed, ResolvedPublic

Description

Unfortunately the save request for https://github.com/rntmancuso/linux-xlnx-prof seems to be stuck. At least it is "running" since 15.6.2021, 08:51:16.

Can you please check?

Related to T3361

Event Timeline

mwagner created this object in space S1 Public.
ardumont triaged this task as Normal priority.Jun 15 2021, 12:25 PM
ardumont added a subscriber: ardumont.

It's not stuck, it is currently running:

Jun 15 06:51:23 worker02 python3[2461343]: [2021-06-15 06:51:23,361: INFO/ForkPoolWorker-1] Load origin 'https://github.com/rntmancuso/linux-xlnx-prof' with type 'git'
Jun 15 06:51:29 worker02 python3[2461343]: [64B blob data]
Jun 15 06:51:29 worker02 python3[2461343]: [3.2K blob data]
Jun 15 06:51:29 worker02 python3[2461343]: [1.2K blob data]
Jun 15 06:53:31 worker02 python3[2461343]: Total 7983291 (delta 83), reused 80 (delta 80), pack-reused 7983173
Jun 15 06:53:31 worker02 python3[2461343]: [2021-06-15 06:53:31,963: INFO/ForkPoolWorker-1] Listed 170 refs for repo https://github.com/rntmancuso/linux-xlnx-prof
...

It recently finished:

Jun 15 13:10:29 worker02 python3[2461343]: [2021-06-15 13:10:29,513: INFO/ForkPoolWorker-1] Task swh.loader.git.tasks.UpdateGitRepository[4b1cdb75-952f-4949-b95f-67259c5bfb62] succeeded in 22746.730732845142s: {'status': 'eventful'}

Expectedly, it can take a bit of time to ingest some large repository.

It's browsable as well [1]

[1] https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/rntmancuso/linux-xlnx-prof

Perfect. Thanks for the support!

ardumont claimed this task.

Sure.

Thanks @ardumont for following up to this task.

Zooming out a bit, we need to find a way to give more feedback to Save Code Now users about what is going on, because otherwise it is entirely natural (especially for big repos like this one) that users will invariably think their requests got stuck.

Would it be feasible to make the last log lines (à la tail -f / systemctl status) of the ongoing ingestion visible to users? Or, better, put in place an heartbeat of sort, explicitly crafted for user feedback, which ends up in the user dashboard? cc: @anlambert @jayeshv

(feel free to split this to a separate issue / merge it with a more relevant one)

I agree having access to the logs of the task (more or less) in real-time would be very handy (as one can expect on any CI-like tool nowadays).

I prefer an access to logs than an heartbeat, but both could be useful... And a heartbeat can be tricky to do properly.