Page MenuHomeSoftware Heritage

Download antepedia's s3 contents not in sesi nor in swh
Closed, MigratedEdits Locked

Description

Event Timeline

ardumont changed the task status from Open to Work in Progress.Feb 27 2016, 8:04 PM
ardumont claimed this task.
ardumont updated the task description. (Show Details)

worker01 is in charge

  • 10 jobs working concurrently
  • 1 job is downloading sequentially 1000 s3 files

~1150 jobs done so far.
~11000 to go.

Note:
As the starting got a little bumpy (unforeseen boto3 api being concurrent created locks...), a snapshot of the first ~1M files is being scanned for corrupted data by worker02.

As of yesterday, 10M were done.

After checks that all was injected, it was not the case, ~2M were missing.
(probably the hole i 'sensed' last saturday the 5th around 5 am after restarting rabbitmq)

~2M currently reinjected in queue (diff between disk scan of /srv/storage/space/antelink/s3 folder and antelink.content_s3_not_in_sesi_nor_in_swh table).

Remaining 2M done.
Now we should be complete.

Check is currently running to determine if some files are still missing.

Check is currently running to determine if some files are still missing.

There were 1500 missing.
They have been downloaded.

So everything has been retrieved.

olasd changed the visibility from "All Users" to "Public (No Login Required)".May 13 2016, 5:09 PM
gitlab-migration changed the task status from Resolved to Migrated.Jan 8 2023, 4:18 PM
gitlab-migration claimed this task.
gitlab-migration changed the status of subtask T317: Inject sesi files hashes in antelink db from Resolved to Migrated.
gitlab-migration changed the status of subtask T319: S3 content files downloader and injection in swh from Resolved to Migrated.
gitlab-migration added a subscriber: gitlab-migration.