Description
Description
Status | Assigned | Task | ||
---|---|---|---|---|
Migrated | gitlab-migration | T321 Download antepedia's s3 contents not in sesi nor in swh | ||
Migrated | gitlab-migration | T317 Inject sesi files hashes in antelink db | ||
Migrated | gitlab-migration | T316 List and compute hashes of actual sesi files | ||
Migrated | gitlab-migration | T319 S3 content files downloader and injection in swh |
Event Timeline
Comment Actions
worker01 is in charge
- 10 jobs working concurrently
- 1 job is downloading sequentially 1000 s3 files
~1150 jobs done so far.
~11000 to go.
Note:
As the starting got a little bumpy (unforeseen boto3 api being concurrent created locks...), a snapshot of the first ~1M files is being scanned for corrupted data by worker02.
Comment Actions
As of yesterday, 10M were done.
After checks that all was injected, it was not the case, ~2M were missing.
(probably the hole i 'sensed' last saturday the 5th around 5 am after restarting rabbitmq)
~2M currently reinjected in queue (diff between disk scan of /srv/storage/space/antelink/s3 folder and antelink.content_s3_not_in_sesi_nor_in_swh table).
Comment Actions
Remaining 2M done.
Now we should be complete.
Check is currently running to determine if some files are still missing.
Comment Actions
Check is currently running to determine if some files are still missing.
There were 1500 missing.
They have been downloaded.
So everything has been retrieved.