When retrieving the archives, we checked for size and md5.
This task is about checking the archive's content which are either svndump, git repository or hg repository.
Description
Description
Status | Assigned | Task | ||
---|---|---|---|---|
Unknown Object (Maniphest Task) | ||||
Migrated | gitlab-migration | T367 ingest Google Code repositories | ||
Migrated | gitlab-migration | T397 Check retrieved archives from googlecode |
Event Timeline
Comment Actions
- done in 86c1353
- packaged in python3-swh.fetcher.googlecode v0.0.3
- deployed on worker01
- worker01 is currently checking those archives
Comment Actions
Around ~120k done.
It's rather slow, around 1.1/s.
|-------------------------------+----------------+ | date-snapshot | messages_ready | |-------------------------------+----------------+ | Thu May 05 23:32:35 CEST 2016 | 1302268 | | Fri May 06 10:47:07 CEST 2016 | 1258667 | #+BEGIN_SRC lisp (let ((speed (swh-worker-average-speed-per-second "Thu May 05 23:32:35 CEST 2016" 1302268 "Fri May 06 10:47:07 CEST 2016" 1258667)) ;; 1.0773127100217434 j/s (remaining-jobs 1258667)) (swh-worker-remains-in-days speed remaining-jobs));; 13.522447992188424 remaining days #+END_SRC
On such sample, only 40 errors (which i did not yet analyze).
psql -c "select level, message from log where src_host='worker01.softwareheritage.org' and ts between '2016-05-04 18:00:00.00+01' and '2016-05-06 10:55:00.00+01' and level = 'error';" service=swh-log > swh-fetcher-googlecode-checks-in-errors-between-04-and-06-may-2016 ardumont@worker01:~$ grep -c FAILURE swh-fetcher-googlecode-checks-in-errors-between-04-and-06-may-2016 40
As this won't complete in the time frame we have left and i forgot to randomize the sample (duh!), i purged the actual queue. I rescheduled a complete randomized samples.
Comment Actions
Only 4132 out of 1379346 files were in errors during checks (~0.29%)
Checking some manually gave no error.
It is possible the worker ran out of disk space or out of memory during checks (if too much concurrent tasks were ran for example).
So those were rescheduled for checking (with less concurrency this time).
Taking a look at those checks in logs (worker01), i see no error either for now.