Page MenuHomeSoftware Heritage

Check retrieved archives from googlecode
Closed, MigratedEdits Locked

Description

When retrieving the archives, we checked for size and md5.
This task is about checking the archive's content which are either svndump, git repository or hg repository.

Event Timeline

  • done in 86c1353
  • packaged in python3-swh.fetcher.googlecode v0.0.3
  • deployed on worker01
  • worker01 is currently checking those archives

Around ~120k done.
It's rather slow, around 1.1/s.

|-------------------------------+----------------+
| date-snapshot                 | messages_ready |
|-------------------------------+----------------+
| Thu May 05 23:32:35 CEST 2016 |        1302268 |
| Fri May 06 10:47:07 CEST 2016 |        1258667 |

#+BEGIN_SRC lisp
(let ((speed (swh-worker-average-speed-per-second "Thu May 05 23:32:35 CEST 2016" 1302268 "Fri May 06 10:47:07 CEST 2016" 1258667)) ;; 1.0773127100217434 j/s
      (remaining-jobs 1258667))
  (swh-worker-remains-in-days speed remaining-jobs));; 13.522447992188424 remaining days
#+END_SRC

On such sample, only 40 errors (which i did not yet analyze).

psql -c "select level, message from log where src_host='worker01.softwareheritage.org' and ts between '2016-05-04 18:00:00.00+01' and '2016-05-06 10:55:00.00+01' and level = 'error';" service=swh-log > swh-fetcher-googlecode-checks-in-errors-between-04-and-06-may-2016
ardumont@worker01:~$ grep -c FAILURE swh-fetcher-googlecode-checks-in-errors-between-04-and-06-may-2016
40

As this won't complete in the time frame we have left and i forgot to randomize the sample (duh!), i purged the actual queue. I rescheduled a complete randomized samples.

olasd changed the visibility from "All Users" to "Public (No Login Required)".May 13 2016, 5:09 PM

Only 4132 out of 1379346 files were in errors during checks (~0.29%)

Checking some manually gave no error.
It is possible the worker ran out of disk space or out of memory during checks (if too much concurrent tasks were ran for example).

So those were rescheduled for checking (with less concurrency this time).
Taking a look at those checks in logs (worker01), i see no error either for now.