Add checkpointing on storage_checker to avoid rechecking objects at the beginning of ranges again and again
Staging:
- apply swh/scrubber/sql/upgrades/4.sql [1]
- upgrade package on workers and stop all workers
- start one worker with --log-level swh.scrubber.storage_checker:DEBUG [2]
- wait for a couple of Processing %s range %s to %s lines [2]
- restart it (still with debug logs) [3]
- check it is not processing the same ranges [3]
- restart all workers (without debug logs)
Production:
- apply swh/scrubber/sql/upgrades/4.sql [4]
- upgrade package on workers
- restart all workers
[1]
swhworker@scrubber0:~$ swh db --config-file /etc/softwareheritage/scrubber/primary.yml upgrade scrubber --module-config-key=scrubber_db INFO:swh.core.db.db_utils:Executing migration script '/usr/lib/python3/dist-packages/swh/scrubber/sql/upgrades/4.sql' Migration to version 4 done
[2]
swhworker@scrubber0:~$ export SWH_CONFIG_FILENAME=/etc/softwareheritage/scrubber/primary.yml swhworker@scrubber0:~$ swh --log-level swh.scrubber.storage_checker:DEBUG scrubber check storage --object-type directory --start-object 0000000000000000000000000000000000000000 --end-object 3fffffffffffffffffffffffffffffffffffffff DEBUG:swh.scrubber.storage_checker:Processing directory range None to 000001 DEBUG:swh.scrubber.storage_checker:Processing directory range 000001 to 000002 DEBUG:swh.scrubber.storage_checker:Processing directory range 000002 to 000003 DEBUG:swh.scrubber.storage_checker:Processing directory range 000003 to 000004 DEBUG:swh.scrubber.storage_checker:Processing directory range 000004 to 000005
[3]
swhworker@scrubber0:~$ swh --log-level swh.scrubber.storage_checker:DEBUG scrubber check storage --object-type directory --start-object 0000000000000000000000000000000000000000 --end-object 3fffffffffffffffffffffffffffffffffffffff DEBUG:swh.scrubber.storage_checker:Skipping processing of directory range None to 000001: already done at 2022-10-18 08:32:42.926663+00:00 DEBUG:swh.scrubber.storage_checker:Skipping processing of directory range 000001 to 000002: already done at 2022-10-18 08:32:49.098090+00:00 DEBUG:swh.scrubber.storage_checker:Skipping processing of directory range 000002 to 000003: already done at 2022-10-18 08:32:57.651759+00:00 DEBUG:swh.scrubber.storage_checker:Skipping processing of directory range 000003 to 000004: already done at 2022-10-18 08:33:11.836088+00:00 DEBUG:swh.scrubber.storage_checker:Processing directory range 000004 to 000005
[4]
swhworker@scrubber1:~$ swh db --config-file /etc/softwareheritage/scrubber/primary.yml upgrade scrubber --module-config-key=scrubber_db INFO:swh.core.db.db_utils:Executing migration script '/usr/lib/python3/dist-packages/swh/scrubber/sql/upgrades/4.sql' Migration to version 4 done