Page MenuHomeSoftware Heritage

Fix heterogeneity of names in metadata tables
Closed, MigratedEdits Locked

Description

The tables revision_metadata and origin_intrinsic_metadata in the indexer storage contain mostly the same data. The only differences are that the latter has a from_revision field that holds the id of a revision, and a metadata_tsvector that's a kind of cache for fulltext-search queries the metadata.

However, naming between the two is fairly inconsistent. The following should be done:

  • The table revision_metadata should be renamed to revision_intrinsic_metadata,
  • The column revision_metadata.translated_metadata should be renamed revision_metadata.metadata,
  • The column origin_intrinsic_metadata.origin_id should be renamed id.

The indexer code (swh/indexer/metadata.py) and storage API (swh/indexer/storage/*.py) endpoints should be updated to reflect this, as well as their tests (swh/indexer/tests/).

A database migration script should also be written (sql/upgrades/).