We would like to have a prototye full text search in the archive contents.
An efficient way of doing this is to exploit the Merkle graph as follors:
- index only the file contents (each content may appear in many different places)
- use the swh-graph and/or provenance index to show the results in context
Prototype working on a 1%+ subset of the archive.
Related work:
- [[ https://sourcegraph.com/ | SourceGraph ]] ([[ https://lwn.net/Articles/828748/ | LWN coverage ]])
- [[ https://codesearch.debian.net/ | Debian Code Search ]]
- [[ https://langserver.org/ | Language Server Protocol ]] and their [[ https://microsoft.github.io/language-server-protocol/specifications/lsif/0.5.0/specification/ | index format ]]