Page MenuHomeSoftware Heritage

Remove metadata deletion endpoints and algorithms
ClosedPublic

Authored by vlorentz on Nov 2 2020, 1:49 PM.

Details

Summary

This was expected to be used in these two cases:

  1. if we remove mappings or file detection from a metadata indexer
  2. if an origin removes all its metadata files

but:

  1. if we do so, then we should bump the indexer version, so the old metadata will be preserved anyway, as different indexer versions get different indexer_configuration_ids
  2. this should be a rather rare even, and even if it happens, we might want to keep the old metadata anyway rather than nothing (even if it's outdated), for search purposes.

Additionally, this commit is motivated by:

  • that's less issues to deal with when writing to Kafka (the journal writer currently doesn't support suppression; and we would also have to add support for deletion in all consumers)
  • less code (~250 lines)

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 16787
Build 25881: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 25880: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D4392 (id=15557)

Rebasing onto 300b307394...

Current branch diff-target is up to date.
Changes applied before test
commit 94c825919320bf3d3e2608b823dc887ed6122413
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Nov 2 13:47:51 2020 +0100

    Remove metadata deletion endpoints and algorithms
    
    This was expected to be used in these two cases:
    
    1. if we remove mappings or file detection from a metadata indexer
    2. if an origin removes all its metadata files
    
    but:
    
    1. if we do so, then we should bump the indexer version, so the
       old metadata will be preserved anyway, as different indexer
       versions get different indexer_configuration_ids
    2. this should be a rather rare even, and even if it happens, we
       might want to keep the old metadata anyway rather than
       nothing (even if it's outdated), for search purposes.
    
    Additionally, this commit is motivated by:
    
    * that's less issues to deal with when writing to Kafka (the journal
      writer currently doesn't support suppression; and we would also have
      to add support for deletion in all consumers)
    * less code (~250 lines)

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/100/ for more details.

That's a good idea, very "archival" like ;-)

I can accept the diff in principal but I can't test the code at the moment.

This revision is now accepted and ready to land.Nov 3 2020, 11:34 AM