HomeSoftware Heritage

Add fulltext search on origin intrinsic metadata.

Description

Add fulltext search on origin intrinsic metadata.

Summary:
I made several choices while writing that Diff that are open to discussion:

  • Using GIN instead of GiST. That seems the most appropriate choice when reading https://www.postgresql.org/docs/9.1/textsearch-indexes.html ; but these part got removed from the doc of pgsql 10: https://www.postgresql.org/docs/10/textsearch-indexes.html
  • Using pg_catalog.simple as dictionary. Since we're dealing with any language and proper names, it seemed best to use a dictionary with no stop word. Though, arguably, most of the data will be English, and stop words usually don't appear in names.
  • It only supports conjunctions of search terms. I could easily add support for arbitrary levels of nestings and disjunctions/negations. That can be done later if we deem it worth it.
  • It indexes JSON keys too. It is probably possible to fix this, at the expanse of complicated SQL code, or some postprocessing in Python.

Resolves T1334 and T1335.

Test Plan: tox

Reviewers: Reviewers, ardumont, olasd

Reviewed By: Reviewers, ardumont, olasd

Subscribers: olasd, ardumont, swh-public-ci

Maniphest Tasks: T1335, T1334

Differential Revision: https://forge.softwareheritage.org/D658