We would like to have a full text search prototype in the archive contents.
An efficient way of doing this is to exploit the Merkle graph as follows:
- index only the file contents (each content may appear in many different places)
- use the swh-graph and/or provenance index to show the results in context
Prototype working on a 1%+ subset of the archive.
Related work: