Page MenuHomeSoftware Heritage

Drop content_metadata table and indexer.
AbandonedPublic

Authored by vlorentz on Aug 9 2019, 4:15 PM.

Details

Reviewers
ardumont
Group Reviewers
Reviewers
Summary

Depends on D1835.
Mutually exclusive with D1836.

Motivation: the content_metadata table is currently used only as a cache for indexing revision/origin intrinsic metadata. But this cache is not properly handled, and not invalidated when indexers are updated.
Fixing the issue would require extensive changes, and I don't think they are worth it; because content_metadata is a lot of complexity to handle, for very little performance benefit as a cache; and there is currently no use other than cache.

So I think we should remove it, and maybe add it back later if needed.

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
drop-content_metadata
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 7759
Build 11155: tox-on-jenkinsJenkins
Build 11154: arc lint + arc unit

Event Timeline

vlorentz created this revision.Aug 9 2019, 4:15 PM
vlorentz planned changes to this revision.Aug 12 2019, 10:11 AM
vlorentz updated this revision to Diff 6201.Aug 12 2019, 1:38 PM

remove WIP

vlorentz retitled this revision from [WIP] Drop content_metadata table and indexer. to Drop content_metadata table and indexer..Aug 12 2019, 1:38 PM
vlorentz updated this revision to Diff 6202.Aug 12 2019, 1:38 PM

fix base commit

ardumont requested changes to this revision.Aug 26 2019, 3:41 PM
ardumont added a subscriber: ardumont.

I'm not exactly sure of what that is (no real description attached to the diff).

I'm missing the migration script (-> required changes).

swh/indexer/metadata.py
151

Please remove the print ;)

This revision now requires changes to proceed.Aug 26 2019, 3:41 PM
vlorentz edited the summary of this revision. (Show Details)Sep 11 2019, 10:38 AM
vlorentz updated this revision to Diff 6673.Sep 11 2019, 11:10 AM
  • rebase
  • remove print
  • add SQL migration
ardumont accepted this revision.Sep 11 2019, 11:18 AM
This revision is now accepted and ready to land.Sep 11 2019, 11:18 AM
vlorentz abandoned this revision.Sep 11 2019, 1:33 PM

<vlorentz> zack: so, conclusion of the content_metadata discussion: let's keep it but not use it as a cache?
<zack> vlorentz: that'd be fine with me