Page MenuHomeSoftware Heritage

Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql
Closed, ResolvedPublic

Description

  1. new column with the hash
  2. fill it (will need a migration with python code)
  3. add unique index
  4. make the new index a primary key?

Related Objects

StatusAssignedTask
Resolvedardumont
OpenNone
Resolveddouardda
Resolvedvlorentz
Openvlorentz
OpenNone
Openvlorentz
Work in Progressolasd
Work in Progressmoranegg
OpenNone
OpenNone
Resolvedmoranegg
Resolvedvlorentz
Resolvedmoranegg
ResolvedNone
Resolvedardumont
Resolvedvlorentz
Resolvedvlorentz
ResolvedNone
OpenNone
Resolvedolasd
Openolasd
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
OpenNone

Event Timeline

vlorentz triaged this task as High priority.Feb 2 2021, 1:37 PM
vlorentz created this task.
vlorentz renamed this task from Allow querying raw_extrinsic_metadata by hash in swh.storage.postgresql to Add an index for raw_extrinsic_metadata.id in swh.storage.postgresql.Feb 2 2021, 2:20 PM

After a lot of back and forth, and the release of swh.model v2.3.0 and swh.storage v0.26.0, this is now all done and deployed in staging and production.

We went the following way for the migration:

  • make swh.model write id fields in the journal
  • deploy swh.storage with the new swh.model (so all writes happen with the new model)
  • run swh storage backfill on the raw_extrinsic_metadata topic to fill the journal with objects using the new model
  • (make sure the journal gets compacted to remove old versions of the object, with a combination of topic.retention.ms and having to run the backfill multiple times for it to work on all the real-world data)
  • run swh storage replay on raw_extrinsic_metadata, using a fork of swh.storage that wrote objects to a new table (using the new schema)
  • once the replayer caught up, run some queries to spot check that all the data got properly migrated
  • once validated, stop the workers; stop replayer; deploy new version of swh.storage with new schema, move the new table in place of the old one (and take care of logical replication); then restart the workers