Page MenuHomeSoftware Heritage

Drop content_metadata table and indexer.
AbandonedPublic

Authored by vlorentz on Aug 9 2019, 4:15 PM.

Details

Reviewers
ardumont
Group Reviewers
Reviewers
Summary

Depends on D1835.
Mutually exclusive with D1836.

Motivation: the content_metadata table is currently used only as a cache for indexing revision/origin intrinsic metadata. But this cache is not properly handled, and not invalidated when indexers are updated.
Fixing the issue would require extensive changes, and I don't think they are worth it; because content_metadata is a lot of complexity to handle, for very little performance benefit as a cache; and there is currently no use other than cache.

So I think we should remove it, and maybe add it back later if needed.

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
drop-content_metadata
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 7231
Build 10213: tox-on-jenkinsJenkins
Build 10212: arc lint + arc unit

Event Timeline

vlorentz retitled this revision from [WIP] Drop content_metadata table and indexer. to Drop content_metadata table and indexer..Aug 12 2019, 1:38 PM
ardumont added a subscriber: ardumont.

I'm not exactly sure of what that is (no real description attached to the diff).

I'm missing the migration script (-> required changes).

swh/indexer/metadata.py
149

Please remove the print ;)

This revision now requires changes to proceed.Aug 26 2019, 3:41 PM
  • rebase
  • remove print
  • add SQL migration
This revision is now accepted and ready to land.Sep 11 2019, 11:18 AM

<vlorentz> zack: so, conclusion of the content_metadata discussion: let's keep it but not use it as a cache?
<zack> vlorentz: that'd be fine with me