For now, this has absolutely no effect on the API users,
as rows are already deduplicated based on a subset of the
fields hashed by the id.
Details
- Reviewers
olasd - Group Reviewers
Reviewers - Maniphest Tasks
- T3019: Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql
- Commits
- rDSTO2d540b0580cc: Add raw_extrinsic_metadata.id column in postgresql.
Diff Detail
- Repository
- rDSTO Storage manager
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 19698 Build 30571: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 30570: arc lint + arc unit
Event Timeline
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1135/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1135/console
Build has FAILED
Patch application report for D5029 (id=17923)
Could not rebase; Attempt merge onto efd8815b89...
Merge made by the 'recursive' strategy. sql/upgrades/168.sql | 24 + swh/storage/cassandra/common.py | 5 - swh/storage/cassandra/converters.py | 2 +- swh/storage/cassandra/cql.py | 3 +- swh/storage/cassandra/storage.py | 3 +- swh/storage/cli.py | 82 ++ swh/storage/postgresql/db.py | 14 +- swh/storage/postgresql/storage.py | 1 + swh/storage/sql/30-schema.sql | 4 +- swh/storage/sql/60-indexes.sql | 4 + .../tests/data/sql-v0.18.0/10-superuser-init.sql | 27 + swh/storage/tests/data/sql-v0.18.0/15-flavor.sql | 22 + swh/storage/tests/data/sql-v0.18.0/20-enums.sql | 23 + swh/storage/tests/data/sql-v0.18.0/30-schema.sql | 499 +++++++++++ swh/storage/tests/data/sql-v0.18.0/40-funcs.sql | 960 +++++++++++++++++++++ swh/storage/tests/data/sql-v0.18.0/60-indexes.sql | 283 ++++++ .../logical_replication/replication_source.sql | 25 + swh/storage/tests/storage_tests.py | 59 +- swh/storage/tests/test_postgresql_migrated.py | 63 ++ swh/storage/tests/test_postgresql_migration.py | 194 +++++ swh/storage/utils.py | 5 + 21 files changed, 2276 insertions(+), 26 deletions(-) create mode 100644 sql/upgrades/168.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/10-superuser-init.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/15-flavor.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/20-enums.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/30-schema.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/40-funcs.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/60-indexes.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/logical_replication/replication_source.sql create mode 100644 swh/storage/tests/test_postgresql_migrated.py create mode 100644 swh/storage/tests/test_postgresql_migration.py
Changes applied before test
commit 645dafed5454d6ce7e970dd8c3aeb9239a75cbc1 Merge: efd8815b 14bfef5c Author: Jenkins user <jenkins@localhost> Date: Fri Feb 5 13:40:17 2021 +0000 Merge branch 'diff-target' into HEAD commit 14bfef5ca8780291f71af7eae2e3b9c45051a101 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Feb 5 13:56:15 2021 +0100 Add raw_extrinsic_metadata.id column in postgresql. For now, this has absolutely no effect on the API users, as rows are already deduplicated based on a subset of the fields hashed by the id. commit 27440f9862b7df4e36b400a4fb74f1679d9ec6f4 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Feb 4 14:36:50 2021 +0100 Add basic migration tests for postgresql This adds two test files: * `test_postgresql_migrated.py` applies an old schema definition, runs the migrations, then runs all the usual tests * `test_postgresql_migration.py` applies an old schema definition, inserts data, runs the migrations, and checks the data is still available `test_postgresql_migration.py` will probably break in some releases as it uses the old SQL with the new Python to insert, but it should be good enough, and we can disable it in some releases when needed. commit e9441fef13c11c3eb500403275ba7337ed0c77e1 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Feb 4 14:29:38 2021 +0100 postgresql: Fix dbversion() to return the max version instead of a random one. commit 508399ce2abf21f813acc9c56422cbbccca0ae3d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Feb 4 13:59:09 2021 +0100 storage_tests: recompute ids when evolving RawExtrinsicMetadata objects. For now this does nothing as RawExtrinsicMetadata has no 'id' field, but the equality assertions will become errors when the next version of swh.model is released.
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1137/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1137/console
Build has FAILED
Patch application report for D5029 (id=17971)
Could not rebase; Attempt merge onto b0383833fe...
Updating b0383833..d3872c70 Fast-forward sql/upgrades/168.sql | 24 + swh/storage/cassandra/common.py | 5 - swh/storage/cassandra/converters.py | 2 +- swh/storage/cassandra/cql.py | 3 +- swh/storage/cassandra/storage.py | 3 +- swh/storage/cli.py | 82 ++ swh/storage/postgresql/db.py | 5 +- swh/storage/postgresql/storage.py | 1 + swh/storage/sql/30-schema.sql | 4 +- swh/storage/sql/60-indexes.sql | 4 + .../tests/data/sql-v0.18.0/10-superuser-init.sql | 27 + swh/storage/tests/data/sql-v0.18.0/15-flavor.sql | 22 + swh/storage/tests/data/sql-v0.18.0/20-enums.sql | 23 + swh/storage/tests/data/sql-v0.18.0/30-schema.sql | 499 +++++++++++ swh/storage/tests/data/sql-v0.18.0/40-funcs.sql | 960 +++++++++++++++++++++ swh/storage/tests/data/sql-v0.18.0/60-indexes.sql | 283 ++++++ .../logical_replication/replication_source.sql | 25 + swh/storage/tests/storage_tests.py | 59 +- swh/storage/tests/test_postgresql_migrated.py | 63 ++ swh/storage/tests/test_postgresql_migration.py | 194 +++++ swh/storage/utils.py | 5 + 21 files changed, 2268 insertions(+), 25 deletions(-) create mode 100644 sql/upgrades/168.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/10-superuser-init.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/15-flavor.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/20-enums.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/30-schema.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/40-funcs.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/60-indexes.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/logical_replication/replication_source.sql create mode 100644 swh/storage/tests/test_postgresql_migrated.py create mode 100644 swh/storage/tests/test_postgresql_migration.py
Changes applied before test
commit d3872c70a5b0a431ef92a3b3c5cc9ddfc8f6251e Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Feb 5 13:56:15 2021 +0100 Add raw_extrinsic_metadata.id column in postgresql. For now, this has absolutely no effect on the API users, as rows are already deduplicated based on a subset of the fields hashed by the id. commit ebe4dab07feed7eb8ae03cd407980ba7bea78b37 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Feb 4 14:36:50 2021 +0100 Add basic migration tests for postgresql This adds two test files: * `test_postgresql_migrated.py` applies an old schema definition, runs the migrations, then runs all the usual tests * `test_postgresql_migration.py` applies an old schema definition, inserts data, runs the migrations, and checks the data is still available `test_postgresql_migration.py` will probably break in some releases as it uses the old SQL with the new Python to insert, but it should be good enough, and we can disable it in some releases when needed. commit 75939e4f9b02e130224a90929eb3ed2ba0730592 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Feb 4 13:59:09 2021 +0100 storage_tests: recompute ids when evolving RawExtrinsicMetadata objects. For now this does nothing as RawExtrinsicMetadata has no 'id' field, but the equality assertions will become errors when the next version of swh.model is released.
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1144/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1144/console
sql/upgrades/168.sql | ||
---|---|---|
18 ↗ | (On Diff #17971) | add? |
Build has FAILED
Patch application report for D5029 (id=17991)
Could not rebase; Attempt merge onto b0383833fe...
Updating b0383833..386bc2fb Fast-forward sql/upgrades/168.sql | 24 + swh/storage/cassandra/common.py | 5 - swh/storage/cassandra/converters.py | 2 +- swh/storage/cassandra/cql.py | 3 +- swh/storage/cassandra/storage.py | 3 +- swh/storage/cli.py | 82 ++ swh/storage/postgresql/db.py | 5 +- swh/storage/postgresql/storage.py | 1 + swh/storage/sql/30-schema.sql | 4 +- swh/storage/sql/60-indexes.sql | 4 + .../tests/data/sql-v0.18.0/10-superuser-init.sql | 27 + swh/storage/tests/data/sql-v0.18.0/15-flavor.sql | 22 + swh/storage/tests/data/sql-v0.18.0/20-enums.sql | 23 + swh/storage/tests/data/sql-v0.18.0/30-schema.sql | 499 +++++++++++ swh/storage/tests/data/sql-v0.18.0/40-funcs.sql | 960 +++++++++++++++++++++ swh/storage/tests/data/sql-v0.18.0/60-indexes.sql | 283 ++++++ .../logical_replication/replication_source.sql | 25 + swh/storage/tests/storage_tests.py | 59 +- swh/storage/tests/test_postgresql_migrated.py | 63 ++ swh/storage/tests/test_postgresql_migration.py | 194 +++++ swh/storage/utils.py | 5 + 21 files changed, 2268 insertions(+), 25 deletions(-) create mode 100644 sql/upgrades/168.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/10-superuser-init.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/15-flavor.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/20-enums.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/30-schema.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/40-funcs.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/60-indexes.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/logical_replication/replication_source.sql create mode 100644 swh/storage/tests/test_postgresql_migrated.py create mode 100644 swh/storage/tests/test_postgresql_migration.py
Changes applied before test
commit 386bc2fbe8de8bc46aba88247c42cd241927e8b2 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Feb 5 13:56:15 2021 +0100 Add raw_extrinsic_metadata.id column in postgresql. For now, this has absolutely no effect on the API users, as rows are already deduplicated based on a subset of the fields hashed by the id. commit ebe4dab07feed7eb8ae03cd407980ba7bea78b37 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Feb 4 14:36:50 2021 +0100 Add basic migration tests for postgresql This adds two test files: * `test_postgresql_migrated.py` applies an old schema definition, runs the migrations, then runs all the usual tests * `test_postgresql_migration.py` applies an old schema definition, inserts data, runs the migrations, and checks the data is still available `test_postgresql_migration.py` will probably break in some releases as it uses the old SQL with the new Python to insert, but it should be good enough, and we can disable it in some releases when needed. commit 75939e4f9b02e130224a90929eb3ed2ba0730592 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Feb 4 13:59:09 2021 +0100 storage_tests: recompute ids when evolving RawExtrinsicMetadata objects. For now this does nothing as RawExtrinsicMetadata has no 'id' field, but the equality assertions will become errors when the next version of swh.model is released.
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1148/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1148/console
Build has FAILED
Patch application report for D5029 (id=18606)
Could not rebase; Attempt merge onto 88ff2c2fa0...
Updating 88ff2c2f..ed597441 Fast-forward sql/upgrades/169.sql | 26 + swh/storage/postgresql/db.py | 5 +- swh/storage/postgresql/storage.py | 1 + swh/storage/sql/30-schema.sql | 4 +- swh/storage/sql/60-indexes.sql | 4 + .../tests/data/sql-v0.18.0/10-superuser-init.sql | 27 + swh/storage/tests/data/sql-v0.18.0/15-flavor.sql | 22 + swh/storage/tests/data/sql-v0.18.0/20-enums.sql | 23 + swh/storage/tests/data/sql-v0.18.0/30-schema.sql | 499 +++++++++++ swh/storage/tests/data/sql-v0.18.0/40-funcs.sql | 960 +++++++++++++++++++++ swh/storage/tests/data/sql-v0.18.0/60-indexes.sql | 283 ++++++ .../logical_replication/replication_source.sql | 25 + swh/storage/tests/test_postgresql_migrated.py | 63 ++ swh/storage/tests/test_postgresql_migration.py | 194 +++++ 14 files changed, 2134 insertions(+), 2 deletions(-) create mode 100644 sql/upgrades/169.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/10-superuser-init.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/15-flavor.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/20-enums.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/30-schema.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/40-funcs.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/60-indexes.sql create mode 100644 swh/storage/tests/data/sql-v0.18.0/logical_replication/replication_source.sql create mode 100644 swh/storage/tests/test_postgresql_migrated.py create mode 100644 swh/storage/tests/test_postgresql_migration.py
Changes applied before test
commit ed597441f79dda6b4eb424e9b596e4c164e966d2 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Feb 5 13:56:15 2021 +0100 Add raw_extrinsic_metadata.id column in postgresql. For now, this has absolutely no effect on the API users, as rows are already deduplicated based on a subset of the fields hashed by the id. commit e5fc49846c1d9ca9961b0b7edf7e0a993401daf6 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Feb 4 14:36:50 2021 +0100 Add basic migration tests for postgresql This adds two test files: * `test_postgresql_migrated.py` applies an old schema definition, runs the migrations, then runs all the usual tests * `test_postgresql_migration.py` applies an old schema definition, inserts data, runs the migrations, and checks the data is still available `test_postgresql_migration.py` will probably break in some releases as it uses the old SQL with the new Python to insert, but it should be good enough, and we can disable it in some releases when needed.
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1178/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1178/console
Build has FAILED
Patch application report for D5029 (id=18607)
Rebasing onto 88ff2c2fa0...
Current branch diff-target is up to date.
Changes applied before test
commit 7ed181fe084152ebb2ad5b1792a8d2d9e5a1c429 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Feb 5 13:56:15 2021 +0100 Add raw_extrinsic_metadata.id column in postgresql. For now, this has absolutely no effect on the API users, as rows are already deduplicated based on a subset of the fields hashed by the id.
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1179/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1179/console
Build is green
Patch application report for D5029 (id=18626)
Rebasing onto 88ff2c2fa0...
Current branch diff-target is up to date.
Changes applied before test
commit 7ed181fe084152ebb2ad5b1792a8d2d9e5a1c429 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Feb 5 13:56:15 2021 +0100 Add raw_extrinsic_metadata.id column in postgresql. For now, this has absolutely no effect on the API users, as rows are already deduplicated based on a subset of the fields hashed by the id.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1183/ for more details.
The comment in the migration doesn't sound accurate; I'm not convinced the production kafka contains all the entries from this table, so if we want to use kafka, we need to truncate and backfill the kafka topic with object ids first, then we can do the update of the table.
Now I wonder if it wouldn't be simpler to just do an ad-hoc migration of the postgresql data directly:
- stop storage
- rename the existing table
- create new table with the final new schema
- restart storage, which will write new objects to the new table
- read entries from the old table, compute their id, write them to the new table
- drop the old table
- truncate and backfill kafka from the new table
I don't remember if loaders actually read data from this table (in which case we can just stop them for the duration of the migration, which should go fairly quickly, for 5m entries) or not.
swh/storage/postgresql/db.py | ||
---|---|---|
1165–1166 | this should be switched to the id column | |
swh/storage/sql/60-indexes.sql | ||
271 | this should probably be deuniqueified |
I prefer backfilling to, then replaying from kafka because
is actually not as easy as it sounds, and I don't want to write and test throwaway code just for this while we already have the replayer.
And the replayer already handles parallelization and restart-on-error thanks to kafka
Build was aborted
Patch application report for D5029 (id=18960)
Rebasing onto 8dd9f7b635...
Current branch diff-target is up to date.
Changes applied before test
commit 2d540b0580cc9699bd8a593db45942b1f14d8e21 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Feb 5 13:56:15 2021 +0100 Add raw_extrinsic_metadata.id column in postgresql. For now, this has absolutely no effect on the API users, as rows are already deduplicated based on a subset of the fields hashed by the id.
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1229/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1229/console
Build is green
Patch application report for D5029 (id=18960)
Rebasing onto 8dd9f7b635...
Current branch diff-target is up to date.
Changes applied before test
commit 2d540b0580cc9699bd8a593db45942b1f14d8e21 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Fri Feb 5 13:56:15 2021 +0100 Add raw_extrinsic_metadata.id column in postgresql. For now, this has absolutely no effect on the API users, as rows are already deduplicated based on a subset of the fields hashed by the id.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1231/ for more details.