Page MenuHomeSoftware Heritage

[WIP] Implement a more refined origin search machinery
AbandonedPublic

Authored by olasd on Oct 22 2018, 6:56 PM.

Details

Reviewers
None
Group Reviewers
Reviewers
Summary

My plan is to implement search heuristics as separate functions in the
backend, with associated indexes, from easiest to hardest.

  • Simple prefix search if the query term looks like a URL
  • Exact (full) keyword search: split the query into queryable terms. Try to find a match for all terms. Sort by relevance.
  • Exact (partial) keyword search (todo): split the query into queryable terms. Try to find a match for most of the terms. Sort by relevance.
  • Fuzzy keyword search (todo): if the keywords didn't match anything, try to find the closest keyword that matches, and search again.

Having all those steps separately will allow the frontend to let the user either
have us try harder to find a match, or to refine their query terms.

Test Plan

Hopefully at some points tests will be added ;)

Diff Detail

Repository
rDSTO Storage manager
Branch
wip/origin-search
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 1738
Build 2089: tox-on-jenkinsJenkins
Build 2088: arc lint + arc unit