Details
- Reviewers
douardda - Group Reviewers
Reviewers - Maniphest Tasks
- T4527: scrubber: keep a state file for postgresql datastores
- Commits
- rDSCRUB84fa17c00be8: storage_checker: Notify database when ranges are fully checked
Diff Detail
- Repository
- rDSCRUB Datastore Scrubber
- Branch
- checkpoint
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 32065 Build 50206: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 50205: arc lint + arc unit
Event Timeline
Build has FAILED
Patch application report for D8609 (id=31094)
Could not rebase; Attempt merge onto 630001ce7b...
Merge made by the 'recursive' strategy. docs/README.rst | 4 + swh/scrubber/db.py | 79 +++++++++++++++- swh/scrubber/sql/30-schema.sql | 17 ++++ swh/scrubber/sql/60-indexes.sql | 6 ++ swh/scrubber/sql/upgrades/4.sql | 21 +++++ swh/scrubber/storage_checker.py | 53 ++++++++++- swh/scrubber/tests/test_db.py | 58 ++++++++++++ swh/scrubber/tests/test_storage_postgresql.py | 131 ++++++++++++++++++++++++-- 8 files changed, 358 insertions(+), 11 deletions(-) create mode 100644 swh/scrubber/sql/upgrades/4.sql create mode 100644 swh/scrubber/tests/test_db.py
Changes applied before test
commit 4779dd8f9b4b1b1758c9a904daa938315b76a7c1 Merge: 630001c c4f91f6 Author: Jenkins user <jenkins@localhost> Date: Tue Oct 4 12:19:24 2022 +0000 Merge branch 'diff-target' into HEAD commit c4f91f62fe3ed6ebb16fa6decd0a3b09d018ef87 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Oct 4 14:18:50 2022 +0200 storage_checker: Notify database when ranges are fully checked For now, this does not use this information to deduplicate work. commit d64e557c18213034772c1784eeedc2e89a1c4a2f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Oct 4 14:17:12 2022 +0200 db: Add table checked_ranges It will be used by the storage_checker to 'remember' what ranges it already checked recently across runs (and crashes), and to monitor progress.
Link to build: https://jenkins.softwareheritage.org/job/DSCRUB/job/tests-on-diff/64/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCRUB/job/tests-on-diff/64/console
Build is green
Patch application report for D8609 (id=31095)
Could not rebase; Attempt merge onto 630001ce7b...
Merge made by the 'recursive' strategy. docs/README.rst | 4 + swh/scrubber/db.py | 79 +++++++++++++++- swh/scrubber/sql/30-schema.sql | 17 ++++ swh/scrubber/sql/60-indexes.sql | 6 ++ swh/scrubber/sql/upgrades/4.sql | 21 +++++ swh/scrubber/storage_checker.py | 53 ++++++++++- swh/scrubber/tests/test_db.py | 58 ++++++++++++ swh/scrubber/tests/test_storage_postgresql.py | 131 ++++++++++++++++++++++++-- 8 files changed, 358 insertions(+), 11 deletions(-) create mode 100644 swh/scrubber/sql/upgrades/4.sql create mode 100644 swh/scrubber/tests/test_db.py
Changes applied before test
commit d2daac04a622ea5020cfa0602cbea06350cd8b08 Merge: 630001c c48f096 Author: Jenkins user <jenkins@localhost> Date: Tue Oct 4 12:25:20 2022 +0000 Merge branch 'diff-target' into HEAD commit c48f096a433c156ac647273728437eb3afe9d278 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Oct 4 14:18:50 2022 +0200 storage_checker: Notify database when ranges are fully checked For now, this does not use this information to deduplicate work. commit d64e557c18213034772c1784eeedc2e89a1c4a2f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Oct 4 14:17:12 2022 +0200 db: Add table checked_ranges It will be used by the storage_checker to 'remember' what ranges it already checked recently across runs (and crashes), and to monitor progress.
See https://jenkins.softwareheritage.org/job/DSCRUB/job/tests-on-diff/65/ for more details.
swh/scrubber/storage_checker.py | ||
---|---|---|
53 | [a, b[ writing to exclude b from the interval? |
swh/scrubber/storage_checker.py | ||
---|---|---|
53 | [a, b[ is the French notation, [a, b) is the English one. |
Build is green
Patch application report for D8609 (id=31198)
Could not rebase; Attempt merge onto 630001ce7b...
Merge made by the 'recursive' strategy. docs/README.rst | 4 + swh/scrubber/db.py | 79 +++++++++++++++- swh/scrubber/sql/30-schema.sql | 17 ++++ swh/scrubber/sql/60-indexes.sql | 6 ++ swh/scrubber/sql/upgrades/4.sql | 21 +++++ swh/scrubber/storage_checker.py | 53 ++++++++++- swh/scrubber/tests/test_db.py | 58 ++++++++++++ swh/scrubber/tests/test_storage_postgresql.py | 131 ++++++++++++++++++++++++-- 8 files changed, 358 insertions(+), 11 deletions(-) create mode 100644 swh/scrubber/sql/upgrades/4.sql create mode 100644 swh/scrubber/tests/test_db.py
Changes applied before test
commit d5e8cd26b0da48cdb8c764584a2c37785ed73c54 Merge: 630001c 85c3514 Author: Jenkins user <jenkins@localhost> Date: Fri Oct 7 09:14:43 2022 +0000 Merge branch 'diff-target' into HEAD commit 85c35148face203df9f5653efda33ad6c729ca90 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Oct 4 14:18:50 2022 +0200 storage_checker: Notify database when ranges are fully checked For now, this does not use this information to deduplicate work. commit 1f3a00dbcbdaadda7b1625cd186a55febf21c3f2 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Oct 4 14:17:12 2022 +0200 db: Add table checked_ranges It will be used by the storage_checker to 'remember' what ranges it already checked recently across runs (and crashes), and to monitor progress.
See https://jenkins.softwareheritage.org/job/DSCRUB/job/tests-on-diff/67/ for more details.
Build is green
Patch application report for D8609 (id=31209)
Could not rebase; Attempt merge onto 630001ce7b...
Updating 630001c..84fa17c Fast-forward docs/README.rst | 4 + swh/scrubber/db.py | 79 +++++++++++++++- swh/scrubber/sql/30-schema.sql | 17 ++++ swh/scrubber/sql/60-indexes.sql | 6 ++ swh/scrubber/sql/upgrades/4.sql | 21 +++++ swh/scrubber/storage_checker.py | 53 ++++++++++- swh/scrubber/tests/test_db.py | 58 ++++++++++++ swh/scrubber/tests/test_storage_postgresql.py | 131 ++++++++++++++++++++++++-- 8 files changed, 358 insertions(+), 11 deletions(-) create mode 100644 swh/scrubber/sql/upgrades/4.sql create mode 100644 swh/scrubber/tests/test_db.py
Changes applied before test
commit 84fa17c00be8746aa2c08d15eeae68913da40842 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Oct 4 14:18:50 2022 +0200 storage_checker: Notify database when ranges are fully checked For now, this does not use this information to deduplicate work. commit fef8a513dfead265788a24e5f580fbe9c746bd32 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Oct 4 14:17:12 2022 +0200 db: Add table checked_ranges It will be used by the storage_checker to 'remember' what ranges it already checked recently across runs (and crashes), and to monitor progress.
See https://jenkins.softwareheritage.org/job/DSCRUB/job/tests-on-diff/70/ for more details.
lgtm (but with a couple of comments)
swh/scrubber/storage_checker.py | ||
---|---|---|
138 | wouldn't swhids.ObjectType[self.object_type.upper()] work here? | |
swh/scrubber/tests/test_storage_postgresql.py | ||
134 | I don't really see the purpose of these test datasets. Not really a problem, but looks strange you felt like testing half the cases for the first digit (and the first digit only) |
swh/scrubber/storage_checker.py | ||
---|---|---|
138 | maybe, I get confused with all the enum syntaxes | |
swh/scrubber/tests/test_storage_postgresql.py | ||
134 | isn't meant to test range generation (because it's imported from swh.storage.backfill). initially I made the test compute these values dynamically, but the code was messy so I opted to write them out like this |