HomeSoftware Heritage

Added level of abstraction in indexers to use BaseIndexer for revisions

Description

Added level of abstraction in indexers to use BaseIndexer for revisions

Summary:

  • renaming methods filter_contents to filter and index_content to index

in all sub-classes and orchestrator

  • renaming dependencies to ContentIndexer instead of BaseIndexer
  • renaming in tests

Added RevisionMetadataIndexer with a detection tool for metadata

  • RevisionMetadataIndexer takes a list of revisions and detects

in the root directory all the file names supported by the
swh-metadata-detector version 0.0.1 that can contain metadata

  • checks if files where translated before in the content_metadata

table

  • if not: sends the files to indexation
  • aggregates results

Note: should keep results in revision_metadata but this part
is not ready in the storage

  • also, changed init of ContentMetadataIndexer with tool in args

Updated documentation with new revision indexer

Test Plan: WIP (will be updated today)

Reviewers: ardumont

Differential Revision: https://forge.softwareheritage.org/D233

Details

Provenance
moraneggAuthored on Jul 25 2017, 3:10 PM
moraneggPushed on Jul 28 2017, 12:45 PM
Differential Revision
D233: Added level of abstraction in indexers to use BaseIndexer for revisions
Parents
rDCIDX0f0ade3a9b3b: Added RevisionMetadataIndexer with a detection tool for metadata
Branches
Unknown
Tags
Unknown