I made several choices while writing that Diff that are open to discussion:
- Using GIN instead of GiST. That seems the most appropriate choice when reading https://www.postgresql.org/docs/9.1/textsearch-indexes.html ; but these part got removed from the doc of pgsql 10: https://www.postgresql.org/docs/10/textsearch-indexes.html
- Using pg_catalog.simple as dictionary. Since we're dealing with any language and proper names, it seemed best to use a dictionary with no stop word. Though, arguably, most of the data will be English, and stop words usually don't appear in names.
- It only supports conjunctions of search terms. I could easily add support for arbitrary levels of nestings and disjunctions/negations. That can be done later if we deem it worth it.
- It indexes JSON keys too. It is probably possible to fix this, at the expanse of complicated SQL code, or some postprocessing in Python.