diff --git a/PKG-INFO b/PKG-INFO
index a20045e..7e3c130 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,90 +1,86 @@
 Metadata-Version: 2.1
 Name: swh.search
-Version: 0.14.1
+Version: 0.15.0
 Summary: Software Heritage search service
 Home-page: https://forge.softwareheritage.org/diffusion/DSEA
 Author: Software Heritage developers
 Author-email: swh-devel@inria.fr
-License: UNKNOWN
 Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
 Project-URL: Funding, https://www.softwareheritage.org/donate
 Project-URL: Source, https://forge.softwareheritage.org/source/swh-search
 Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-search/
-Platform: UNKNOWN
 Classifier: Programming Language :: Python :: 3
 Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
 Classifier: Operating System :: OS Independent
 Classifier: Development Status :: 3 - Alpha
 Requires-Python: >=3.7
 Description-Content-Type: text/markdown
 Provides-Extra: testing
 License-File: LICENSE
 License-File: AUTHORS
 
 swh-search
 ==========
 
 Search service for the Software Heritage archive.
 
 It is similar to swh-storage in what it contains,
 but provides different ways to query it: while swh-storage is mostly
 a key-value store that returns an object from a primary key,
 swh-search is focused on reverse indices, to allow finding objects that
 match some criteria; for example full-text search.
 
 Currently uses ElasticSearch, and provides only origin search (by URL and metadata)
 
 ## Dependencies
 
 - Python tests for this module include tests that cannot be run without a local
 ElasticSearch instance, so you need the ElasticSearch server executable on your
 machine (no need to have a running ElasticSearch server).
 
     - Debian-like host
 
         The elasticsearch package is required. As it's not part of debian-stable,
         [another debian repository is required to be
         configured](https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html#deb-repo)
 
     - Non Debian-like host
 
         The tests expect:
         - `/usr/share/elasticsearch/jdk/bin/java` to exist.
         - `org.elasticsearch.bootstrap.Elasticsearch` to be in java's classpath.
 - Emscripten is required for generating tree-sitter WASM module. The following commands need to be executed for the setup:
     ```bash
     cd /opt && git clone https://github.com/emscripten-core/emsdk.git && cd emsdk && \
     ./emsdk install latest && ./emsdk activate latest
     PATH="${PATH}:/opt/emsdk/upstream/emscripten"
     ```
 
     **Note:** If emsdk isn't found in the PATH, the tree-sitter cli automatically pulls `emscripten/emsdk` image from docker hub when `make ts-build-wasm` or `make ts-build` is used.
 
 
 ## Make targets
 
 Below is the list of available make targets that can be executed from the root directory of swh-search in order to build and/or execute the swh-search under various configurations:
 
 * **ts-install**: Install node_modules and emscripten SDK required for TreeSitter
 
 * **ts-generate**: Generate parser files(C and JSON) from the grammar
 
 * **ts-repl**: Starts a web based playground for the TreeSitter grammar. It's the recommended way for developing TreeSitter grammar.
 
 * **ts-dev**: Parse the `query_language/sample_query` and print the corresponding syntax expression
 along with the start and end positions of all the nodes.
 
 * **ts-dev sanitize=1**: Same as **ts-dev** but without start and end position of the nodes.
 This format is expected by TreeSitter's native test command. `sanitize=1` cleans the output
 of **ts-dev** using `sed` to achieve the desired format.
 
 * **ts-test**: executes TreeSitter's native tests
 
 * **ts-build-so**: Generates `swh_ql.so` file from the previously generated parser using py-tree-sitter
 
 * **ts-build-so**: Generates `swh_ql.wasm` file from the previously generated parser using emscripten
 
 * **ts-build**: Executes both **ts-build-so** and **ts-build-so**
-
-
diff --git a/debian/changelog b/debian/changelog
index d84bb42..906669a 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,415 +1,419 @@
-swh-search (0.14.1-1~swh1~bpo10+1) buster-swh; urgency=medium
+swh-search (0.15.0-1~swh1) unstable-swh; urgency=medium
 
-  * Rebuild for buster-swh
+  * New upstream release 0.15.0     - (tagged by Valentin Lorentz
+    <vlorentz@softwareheritage.org> on 2022-07-18 17:54:39 +0200)
+  * Upstream changes:     - v0.15.0     - * Prevent 'version' field (and
+    others) from being dynamically infered as double     - * minor mypy
+    fix
 
- -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Fri, 06 May 2022 12:57:24 +0000
+ -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Mon, 18 Jul 2022 15:59:31 +0000
 
 swh-search (0.14.1-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.14.1     - (tagged by Antoine Lambert
     <anlambert@softwareheritage.org> on 2022-05-06 14:45:03 +0200)
   * Upstream changes:     - version 0.14.1
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Fri, 06 May 2022 12:51:31 +0000
 
 swh-search (0.14.0-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.14.0     - (tagged by Antoine R. Dumont
     (@ardumont) <ardumont@softwareheritage.org> on 2022-04-29 15:43:46
     +0200)
   * Upstream changes:     - v0.14.0     - Add new supported visit type
     'maven'     - Fix query language metadata filter example     -
     Internals:     - Bump mypy to v0.942     - conftest: Fix tests hang
     with elasticsearch 7.17.3     - pre-commit: Remove codespell commit-
     msg hook     - Add .git-blame-ignore-revs file with automatic
     reformatting commits     - python: Reformat code with black 22.3.0
     - pre-commit, tox: Bump black from 19.10b0 to 22.3.0     -
     requirements-test: Remove pytest pinning to < 7
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Fri, 29 Apr 2022 13:49:57 +0000
 
 swh-search (0.13.2-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.13.2     - (tagged by Valentin Lorentz
     <vlorentz@softwareheritage.org> on 2022-03-29 10:40:56 +0200)
   * Upstream changes:     - v0.13.2     - * server: Return
     SearchQuerySyntaxError as 400 instead of 500     - * refactorings
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Tue, 29 Mar 2022 12:51:59 +0000
 
 swh-search (0.13.1-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.13.1     - (tagged by Valentin Lorentz
     <vlorentz@softwareheritage.org> on 2022-03-07 12:49:31 +0100)
   * Upstream changes:     - v0.13.1     - * docs: Update examples of the
     query language
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Mon, 07 Mar 2022 11:54:17 +0000
 
 swh-search (0.13.0-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.13.0     - (tagged by Valentin Lorentz
     <vlorentz@softwareheritage.org> on 2022-02-16 13:12:20 +0100)
   * Upstream changes:     - v0.13.0     - * Use ':' for substring
     matching instead of '='     - * translator: Fix 'visited = false'
     queries to actually return results.     - * grammar: Prevent
     'isoDateTime' rule from being too greedy
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Wed, 16 Feb 2022 12:17:38 +0000
 
 swh-search (0.12.1-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.12.1     - (tagged by Valentin Lorentz
     <vlorentz@softwareheritage.org> on 2022-02-14 15:26:46 +0100)
   * Upstream changes:     - v0.12.1     - * Make RemoteSearch reraise
     specific exceptions instead of generic RemoteException     - * Fix
     crash when no filter but the main query is given
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Mon, 14 Feb 2022 14:31:40 +0000
 
 swh-search (0.12.0-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.12.0     - (tagged by Valentin Lorentz
     <vlorentz@softwareheritage.org> on 2022-01-12 13:55:44 +0100)
   * Upstream changes:     - v0.12.0     - * search: Ensure CodeMeta
     dates are properly formatted     - * setup.py: use yarnpkg instead
     of yarn if present in PATH     - * swh.search.utils: Fix type     -
     * conftest: Fix tests hang since elasticsearch 7.16 release     - *
     Unpin tree-sitter dependency     - * tests: Use
     TimestampWithTimezone.from_datetime() instead of the constructor
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Wed, 12 Jan 2022 13:00:33 +0000
 
 swh-search (0.11.6-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.11.6     - (tagged by Antoine Lambert
     <anlambert@softwareheritage.org> on 2021-09-29 15:47:53 +0200)
   * Upstream changes:     - version 0.11.6
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Wed, 29 Sep 2021 13:53:44 +0000
 
 swh-search (0.11.5-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.11.5     - (tagged by Antoine Lambert
     <anlambert@softwareheritage.org> on 2021-09-28 17:39:15 +0200)
   * Upstream changes:     - version 0.11.5
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Tue, 28 Sep 2021 15:48:00 +0000
 
 swh-search (0.11.4-2~swh1) unstable-swh; urgency=medium
 
   * Use --no-ext-rename in dh_python3 to avoid renaming swh_ql.so
 
  -- Nicolas Dandrimont <olasd@debian.org>  Wed, 01 Sep 2021 17:12:49 +0200
 
 swh-search (0.11.4-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.11.4     - (tagged by Valentin Lorentz
     <vlorentz@softwareheritage.org> on 2021-08-31 15:01:41 +0200)
   * Upstream changes:     - v0.11.4     - * Fix debian build
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Tue, 31 Aug 2021 13:15:08 +0000
 
 swh-search (0.11.3-3~swh1) unstable-swh; urgency=medium
 
   * This package is now architecture-dependent
   * Make pytest more verbose
 
  -- Nicolas Dandrimont <olasd@debian.org>  Tue, 31 Aug 2021 15:00:42 +0200
 
 swh-search (0.11.3-2~swh1) unstable-swh; urgency=medium
 
   * Add python3-tree-sitter build-dependency
 
  -- Nicolas Dandrimont <olasd@debian.org>  Tue, 31 Aug 2021 14:18:43 +0200
 
 swh-search (0.11.3-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.11.3     - (tagged by Valentin Lorentz
     <vlorentz@softwareheritage.org> on 2021-08-31 14:04:03 +0200)
   * Upstream changes:     - v0.11.3     - * clean up sdist
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Tue, 31 Aug 2021 12:14:47 +0000
 
 swh-search (0.11.2-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.11.2     - (tagged by Valentin Lorentz
     <vlorentz@softwareheritage.org> on 2021-08-18 12:02:09 +0200)
   * Upstream changes:     - v0.11.2     - * cli.py: Add rpc-serve
     command     - * grammar.js: Improve grammar and export tokens
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Wed, 18 Aug 2021 10:07:04 +0000
 
 swh-search (0.11.1-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.11.1     - (tagged by Vincent SELLIER
     <vincent.sellier@softwareheritage.org> on 2021-08-16 18:33:00 +0200)
   * Upstream changes:     - v0.11.1     - fix the tree-sitter dependency
     management during the pypi build
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Mon, 16 Aug 2021 16:40:38 +0000
 
 swh-search (0.11.0-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.11.0     - (tagged by Valentin Lorentz
     <vlorentz@softwareheritage.org> on 2021-08-09 17:27:33 +0200)
   * Upstream changes:     - v0.11.0     - * Add logging for search terms
     in debug mode     - * journal_client: use origin_visit_status.type
     instead of origin_visit     - * Add query language     - * Disable
     fetch_last_revision_release_date outside tests
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Fri, 13 Aug 2021 14:42:01 +0000
 
 swh-search (0.10.0-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.10.0     - (tagged by Nicolas Dandrimont
     <nicolas@dandrimont.eu> on 2021-07-21 10:35:59 +0200)
   * Upstream changes:     - Release swh.search v0.10.0
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Wed, 21 Jul 2021 08:41:27 +0000
 
 swh-search (0.9.0-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.9.0     - (tagged by Vincent SELLIER
     <vincent.sellier@softwareheritage.org> on 2021-06-17 16:54:50 +0200)
   * Upstream changes:     - v0.9.0     - Changelog:     - * Fix boolean
     mapping in metadata document     - * Store nb_visits and
     last_visit_date     - *
     test_origin_intrinsic_metadata_long_description: Re-increase
     description size     - * tests/test_search: Use a reasonably long
     description value     - * tests/elasticsearch: Catch painless script
     errors and pretty print them     - * mypy: Fix errors with release
     >= v0.900
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Thu, 17 Jun 2021 15:01:42 +0000
 
 swh-search (0.8.1-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.8.1     - (tagged by Antoine Lambert
     <antoine.lambert@inria.fr> on 2021-04-29 14:36:43 +0200)
   * Upstream changes:     - version 0.8.1
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Thu, 29 Apr 2021 12:41:23 +0000
 
 swh-search (0.8.0-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.8.0     - (tagged by Nicolas Dandrimont
     <nicolas@dandrimont.eu> on 2021-04-08 17:37:41 +0200)
   * Upstream changes:     - Release swh.search 0.8.0     - Implement a
     blocklist for origin results     - Fix docs typesetting
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Thu, 08 Apr 2021 15:42:22 +0000
 
 swh-search (0.7.1-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.7.1     - (tagged by Vincent SELLIER
     <vincent.sellier@softwareheritage.org> on 2021-03-04 15:59:28 +0100)
   * Upstream changes:     - v0.7.1     - Changelog:     - * Allow to
     instantiate the service with default indexes configuration
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Thu, 04 Mar 2021 15:06:34 +0000
 
 swh-search (0.7.0-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.7.0     - (tagged by Vincent SELLIER
     <vincent.sellier@softwareheritage.org> on 2021-03-04 12:09:12 +0100)
   * Upstream changes:     - v0.7.0     - Changelog:     - * Ensure the
     elasticsearch indexes are initialized before the first request     -
     * Use elasticsearch aliases to simplify maintenance operations     -
     * search.cli: Drop unused and untested rpc-serve cli entrypoint     -
     * api.wsgi: Drop unused wsgi module     - * Add missing server tests
     - * Add typing to origin_update's argument and origin_search's
     return
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Thu, 04 Mar 2021 11:19:29 +0000
 
 swh-search (0.6.1-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.6.1     - (tagged by Antoine Lambert
     <antoine.lambert@inria.fr> on 2021-02-18 18:55:56 +0100)
   * Upstream changes:     - version 0.6.1
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Thu, 18 Feb 2021 18:00:51 +0000
 
 swh-search (0.6.0-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.6.0     - (tagged by Antoine Lambert
     <antoine.lambert@inria.fr> on 2021-02-18 15:28:07 +0100)
   * Upstream changes:     - version 0.6.0
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Thu, 18 Feb 2021 14:31:07 +0000
 
 swh-search (0.5.0-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.5.0     - (tagged by Vincent SELLIER
     <vincent.sellier@softwareheritage.org> on 2021-02-18 11:20:43 +0100)
   * Upstream changes:     - v0.5.0     - Add monitoring metrics
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Thu, 18 Feb 2021 10:25:39 +0000
 
 swh-search (0.4.2-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.4.2     - (tagged by Antoine Lambert
     <antoine.lambert@inria.fr> on 2021-02-17 11:09:21 +0100)
   * Upstream changes:     - version 0.4.2
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Wed, 17 Feb 2021 10:14:16 +0000
 
 swh-search (0.4.1-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.4.1     - (tagged by Vincent SELLIER
     <vincent.sellier@softwareheritage.org> on 2021-01-07 16:15:23 +0100)
   * Upstream changes:     - v0.4.1
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Thu, 07 Jan 2021 15:18:24 +0000
 
 swh-search (0.4.0-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.4.0     - (tagged by Vincent SELLIER
     <vincent.sellier@softwareheritage.org> on 2020-12-23 16:37:18 +0100)
   * Upstream changes:     - Support an index name prefix
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Wed, 23 Dec 2020 15:41:09 +0000
 
 swh-search (0.3.5-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.3.5     - (tagged by Valentin Lorentz
     <vlorentz@softwareheritage.org> on 2020-12-22 17:32:26 +0100)
   * Upstream changes:     - v0.3.5     - * Write some basic
     documentation to describe what swh-search is.     - * Add more
     comments in elasticsearch.py
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Tue, 22 Dec 2020 16:38:29 +0000
 
 swh-search (0.3.4-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.3.4     - (tagged by Antoine R. Dumont
     (@ardumont) <ardumont@softwareheritage.org> on 2020-12-17 12:13:49
     +0100)
   * Upstream changes:     - v0.3.4     - search.journal_client: Actually
     filter on full origin_visit_status
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Thu, 17 Dec 2020 11:16:32 +0000
 
 swh-search (0.3.3-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.3.3     - (tagged by Antoine R. Dumont
     (@ardumont) <ardumont@softwareheritage.org> on 2020-12-11 15:20:01
     +0100)
   * Upstream changes:     - v0.3.3     - Use cross-field search.     -
     Normalize Codemeta documents by expanding them.     - Add test for
     long descriptions.
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Fri, 11 Dec 2020 14:22:59 +0000
 
 swh-search (0.3.2-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.3.2     - (tagged by Antoine R. Dumont
     (@ardumont) <ardumont@softwareheritage.org> on 2020-12-10 09:49:35
     +0100)
   * Upstream changes:     - v0.3.2     - search.journal_client: Fix key
     error     - test_journal_client: Migrate to pytest
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Thu, 10 Dec 2020 08:54:53 +0000
 
 swh-search (0.3.1-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.3.1     - (tagged by Antoine R. Dumont
     (@ardumont) <ardumont@softwareheritage.org> on 2020-12-09 18:21:33
     +0100)
   * Upstream changes:     - v0.3.1     - Allow configuration through cli
     or config file
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Wed, 09 Dec 2020 18:53:39 +0000
 
 swh-search (0.3.0-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.3.0     - (tagged by Antoine R. Dumont
     (@ardumont) <ardumont@softwareheritage.org> on 2020-12-08 11:30:33
     +0100)
   * Upstream changes:     - v0.3.0     - cli: Subscribe journal client
     to origin_intrinsic_metadata topic     - cli: Subscribe journal
     client to origin_visit_status     - cli: Allow topic prefix
     declaration through cli or configuration     - cli: Allow object-
     type declaration through cli or configuration     - tox.ini: pin
     black to the pre-commit version (19.10b0) to avoid flip-flops     -
     Run isort after the CLI import changes
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Tue, 08 Dec 2020 10:33:30 +0000
 
 swh-search (0.2.3-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.2.3     - (tagged by David Douard
     <david.douard@sdfa3.org> on 2020-09-25 12:51:11 +0200)
   * Upstream changes:     - v0.2.3
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Fri, 25 Sep 2020 10:53:12 +0000
 
 swh-search (0.2.2-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.2.2     - (tagged by Antoine R. Dumont
     (@ardumont) <ardumont@softwareheritage.org> on 2020-08-03 11:58:53
     +0200)
   * Upstream changes:     - v0.2.2     - Fix test_cli.invoke for old
     PyYAML versions (such as 3.13, in Debian 10).
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Mon, 03 Aug 2020 10:00:05 +0000
 
 swh-search (0.2.1-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.2.1     - (tagged by Antoine R. Dumont
     (@ardumont) <ardumont@softwareheritage.org> on 2020-08-03 10:59:31
     +0200)
   * Upstream changes:     - v0.2.1     - setup.py: Migrate from
     vcversioner to setuptools-scm
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Mon, 03 Aug 2020 09:00:39 +0000
 
 swh-search (0.2.0-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.2.0     - (tagged by Antoine R. Dumont
     (@ardumont) <ardumont@softwareheritage.org> on 2020-08-03 10:40:39
     +0200)
   * Upstream changes:     - v0.2.0     - swh.search: Define an interface
     for search backends and use it     - swh.search.get_search: Simplify
     instantiation
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Mon, 03 Aug 2020 08:42:45 +0000
 
 swh-search (0.1.0-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.1.0     - (tagged by Antoine R. Dumont
     (@ardumont) <ardumont@softwareheritage.org> on 2020-07-31 14:05:22
     +0200)
   * Upstream changes:     - v0.1.0     - Type origin_search(...) ->
     PagedResult[Dict]     - README: Update necessary dependencies for
     test purposes     - Fixes on journal updates     - Blackify strings
     - setup: Update the minimum required runtime python3 version
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Fri, 31 Jul 2020 12:10:22 +0000
 
 swh-search (0.0.4-1~swh1) unstable-swh; urgency=medium
 
   * New upstream release 0.0.4     - (tagged by Antoine R. Dumont
     (@ardumont) <antoine.romain.dumont@gmail.com> on 2020-01-23 15:00:50
     +0100)
   * Upstream changes:     - v0.0.4 docs: Remove swh-py-template label -
     Only return results where all terms match. - Don't use
     refresh='wait_for' when updating origins. - Add a 'sha1' field to
     origin documents, used for sorting. - Add a pre-commit config file -
     Migrate tox.ini to extras = xxx instead of deps = .[testing] - De-
     specify testenv:py3 - Include all requirements in MANIFEST.in
 
  -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org>  Thu, 23 Jan 2020 14:04:17 +0000
 
 swh-search (0.0.3-1~swh2) unstable-swh; urgency=medium
 
   * Filter out swh/__init__.py from package
 
  -- Nicolas Dandrimont <olasd@debian.org>  Tue, 14 Jan 2020 16:38:23 +0100
 
 swh-search (0.0.3-1~swh1) unstable-swh; urgency=medium
 
   * Initial packaging
 
  -- Nicolas Dandrimont <olasd@debian.org>  Mon, 13 Jan 2020 16:59:11 +0100
diff --git a/swh.search.egg-info/PKG-INFO b/swh.search.egg-info/PKG-INFO
index a20045e..7e3c130 100644
--- a/swh.search.egg-info/PKG-INFO
+++ b/swh.search.egg-info/PKG-INFO
@@ -1,90 +1,86 @@
 Metadata-Version: 2.1
 Name: swh.search
-Version: 0.14.1
+Version: 0.15.0
 Summary: Software Heritage search service
 Home-page: https://forge.softwareheritage.org/diffusion/DSEA
 Author: Software Heritage developers
 Author-email: swh-devel@inria.fr
-License: UNKNOWN
 Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
 Project-URL: Funding, https://www.softwareheritage.org/donate
 Project-URL: Source, https://forge.softwareheritage.org/source/swh-search
 Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-search/
-Platform: UNKNOWN
 Classifier: Programming Language :: Python :: 3
 Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
 Classifier: Operating System :: OS Independent
 Classifier: Development Status :: 3 - Alpha
 Requires-Python: >=3.7
 Description-Content-Type: text/markdown
 Provides-Extra: testing
 License-File: LICENSE
 License-File: AUTHORS
 
 swh-search
 ==========
 
 Search service for the Software Heritage archive.
 
 It is similar to swh-storage in what it contains,
 but provides different ways to query it: while swh-storage is mostly
 a key-value store that returns an object from a primary key,
 swh-search is focused on reverse indices, to allow finding objects that
 match some criteria; for example full-text search.
 
 Currently uses ElasticSearch, and provides only origin search (by URL and metadata)
 
 ## Dependencies
 
 - Python tests for this module include tests that cannot be run without a local
 ElasticSearch instance, so you need the ElasticSearch server executable on your
 machine (no need to have a running ElasticSearch server).
 
     - Debian-like host
 
         The elasticsearch package is required. As it's not part of debian-stable,
         [another debian repository is required to be
         configured](https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html#deb-repo)
 
     - Non Debian-like host
 
         The tests expect:
         - `/usr/share/elasticsearch/jdk/bin/java` to exist.
         - `org.elasticsearch.bootstrap.Elasticsearch` to be in java's classpath.
 - Emscripten is required for generating tree-sitter WASM module. The following commands need to be executed for the setup:
     ```bash
     cd /opt && git clone https://github.com/emscripten-core/emsdk.git && cd emsdk && \
     ./emsdk install latest && ./emsdk activate latest
     PATH="${PATH}:/opt/emsdk/upstream/emscripten"
     ```
 
     **Note:** If emsdk isn't found in the PATH, the tree-sitter cli automatically pulls `emscripten/emsdk` image from docker hub when `make ts-build-wasm` or `make ts-build` is used.
 
 
 ## Make targets
 
 Below is the list of available make targets that can be executed from the root directory of swh-search in order to build and/or execute the swh-search under various configurations:
 
 * **ts-install**: Install node_modules and emscripten SDK required for TreeSitter
 
 * **ts-generate**: Generate parser files(C and JSON) from the grammar
 
 * **ts-repl**: Starts a web based playground for the TreeSitter grammar. It's the recommended way for developing TreeSitter grammar.
 
 * **ts-dev**: Parse the `query_language/sample_query` and print the corresponding syntax expression
 along with the start and end positions of all the nodes.
 
 * **ts-dev sanitize=1**: Same as **ts-dev** but without start and end position of the nodes.
 This format is expected by TreeSitter's native test command. `sanitize=1` cleans the output
 of **ts-dev** using `sed` to achieve the desired format.
 
 * **ts-test**: executes TreeSitter's native tests
 
 * **ts-build-so**: Generates `swh_ql.so` file from the previously generated parser using py-tree-sitter
 
 * **ts-build-so**: Generates `swh_ql.wasm` file from the previously generated parser using emscripten
 
 * **ts-build**: Executes both **ts-build-so** and **ts-build-so**
-
-
diff --git a/swh/search/elasticsearch.py b/swh/search/elasticsearch.py
index 35431c5..f8dcb4b 100644
--- a/swh/search/elasticsearch.py
+++ b/swh/search/elasticsearch.py
@@ -1,582 +1,600 @@
 # Copyright (C) 2019-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import base64
 from collections import Counter
 import logging
 import pprint
 from textwrap import dedent
-from typing import Any, Dict, Iterable, List, Optional
+from typing import Any, Dict, Iterable, List, Optional, cast
 
 from elasticsearch import Elasticsearch, helpers
 import msgpack
 
 from swh.indexer import codemeta
 from swh.model import model
 from swh.model.hashutil import hash_to_hex
 from swh.search.interface import (
     SORT_BY_OPTIONS,
     MinimalOriginDict,
     OriginDict,
     PagedResult,
 )
 from swh.search.metrics import send_metric, timed
 from swh.search.translator import Translator
 from swh.search.utils import escape, get_expansion, parse_and_format_date
 
 logger = logging.getLogger(__name__)
 
 INDEX_NAME_PARAM = "index"
 READ_ALIAS_PARAM = "read_alias"
 WRITE_ALIAS_PARAM = "write_alias"
 
 ORIGIN_DEFAULT_CONFIG = {
     INDEX_NAME_PARAM: "origin",
     READ_ALIAS_PARAM: "origin-read",
     WRITE_ALIAS_PARAM: "origin-write",
 }
 
 ORIGIN_MAPPING = {
     "dynamic_templates": [
         {
             "booleans_as_string": {
                 # All fields stored as string in the metadata
                 # even the booleans
                 "match_mapping_type": "boolean",
                 "path_match": "intrinsic_metadata.*",
                 "mapping": {"type": "keyword"},
             }
-        }
+        },
+        {
+            "floats_as_string": {
+                # All fields stored as string in the metadata
+                # even the floats
+                "match_mapping_type": "double",
+                "path_match": "intrinsic_metadata.*",
+                "mapping": {"type": "text"},
+            }
+        },
+        {
+            "longs_as_string": {
+                # All fields stored as string in the metadata
+                # even the longs
+                "match_mapping_type": "long",
+                "path_match": "intrinsic_metadata.*",
+                "mapping": {"type": "text"},
+            }
+        },
     ],
     "date_detection": False,
     "properties": {
         # sha1 of the URL; used as the document id
         "sha1": {
             "type": "keyword",
             "doc_values": True,
         },
         # Used both to search URLs, and as the result to return
         # as a response to queries
         "url": {
             "type": "text",
             # To split URLs into token on any character
             # that is not alphanumerical
             "analyzer": "simple",
             # 2-gram and partial-3-gram search (ie. with the end of the
             # third word potentially missing)
             "fields": {
                 "as_you_type": {
                     "type": "search_as_you_type",
                     "analyzer": "simple",
                 }
             },
         },
         "visit_types": {"type": "keyword"},
         # used to filter out origins that were never visited
         "has_visits": {
             "type": "boolean",
         },
         "nb_visits": {"type": "integer"},
         "snapshot_id": {"type": "keyword"},
         "last_visit_date": {"type": "date"},
         "last_eventful_visit_date": {"type": "date"},
         "last_release_date": {"type": "date"},
         "last_revision_date": {"type": "date"},
         "intrinsic_metadata": {
             "type": "nested",
             "properties": {
                 "@context": {
                     # don't bother indexing tokens in these URIs, as the
                     # are used as namespaces
                     "type": "keyword",
                 },
                 "http://schema": {
                     "properties": {
                         "org/dateCreated": {
                             "properties": {
                                 "@value": {
                                     "type": "date",
                                 }
                             }
                         },
                         "org/dateModified": {
                             "properties": {
                                 "@value": {
                                     "type": "date",
                                 }
                             }
                         },
                         "org/datePublished": {
                             "properties": {
                                 "@value": {
                                     "type": "date",
                                 }
                             }
                         },
                     }
                 },
             },
         },
         # Has this origin been taken down?
         "blocklisted": {
             "type": "boolean",
         },
     },
 }
 
 # painless script that will be executed when updating an origin document
 ORIGIN_UPDATE_SCRIPT = dedent(
     """
     // utility function to get and parse date
     ZonedDateTime getDate(def ctx, String date_field) {
         String default_date = "0001-01-01T00:00:00Z";
         String date = ctx._source.getOrDefault(date_field, default_date);
         return ZonedDateTime.parse(date);
     }
 
     // backup current visit_types field value
     List visit_types = ctx._source.getOrDefault("visit_types", []);
     int nb_visits = ctx._source.getOrDefault("nb_visits", 0);
 
     ZonedDateTime last_visit_date = getDate(ctx, "last_visit_date");
 
     String snapshot_id = ctx._source.getOrDefault("snapshot_id", "");
     ZonedDateTime last_eventful_visit_date =
         getDate(ctx, "last_eventful_visit_date");
     ZonedDateTime last_revision_date = getDate(ctx, "last_revision_date");
     ZonedDateTime last_release_date = getDate(ctx, "last_release_date");
 
     // update origin document with new field values
     ctx._source.putAll(params);
 
     // restore previous visit types after visit_types field overriding
     if (ctx._source.containsKey("visit_types")) {
         for (int i = 0; i < visit_types.length; ++i) {
             if (!ctx._source.visit_types.contains(visit_types[i])) {
                 ctx._source.visit_types.add(visit_types[i]);
             }
         }
     }
 
     // Undo overwrite if incoming nb_visits is smaller
     if (ctx._source.containsKey("nb_visits")) {
         int incoming_nb_visits = ctx._source.getOrDefault("nb_visits", 0);
         if(incoming_nb_visits < nb_visits){
             ctx._source.nb_visits = nb_visits;
         }
     }
 
     // Undo overwrite if incoming last_visit_date is older
     if (ctx._source.containsKey("last_visit_date")) {
         ZonedDateTime incoming_last_visit_date = getDate(ctx, "last_visit_date");
         int difference =
             // returns -1, 0 or 1
             incoming_last_visit_date.compareTo(last_visit_date);
         if(difference < 0){
             ctx._source.last_visit_date = last_visit_date;
         }
     }
 
     // Undo update of last_eventful_date and snapshot_id if
     // snapshot_id hasn't changed OR incoming_last_eventful_visit_date is older
     if (ctx._source.containsKey("snapshot_id")) {
         String incoming_snapshot_id = ctx._source.getOrDefault("snapshot_id", "");
         ZonedDateTime incoming_last_eventful_visit_date =
             getDate(ctx, "last_eventful_visit_date");
         int difference =
             // returns -1, 0 or 1
             incoming_last_eventful_visit_date.compareTo(last_eventful_visit_date);
         if(snapshot_id == incoming_snapshot_id || difference < 0){
             ctx._source.snapshot_id = snapshot_id;
             ctx._source.last_eventful_visit_date = last_eventful_visit_date;
         }
     }
 
     // Undo overwrite if incoming last_revision_date is older
     if (ctx._source.containsKey("last_revision_date")) {
         ZonedDateTime incoming_last_revision_date =
             getDate(ctx, "last_revision_date");
         int difference =
             // returns -1, 0 or 1
             incoming_last_revision_date.compareTo(last_revision_date);
         if(difference < 0){
             ctx._source.last_revision_date = last_revision_date;
         }
     }
 
     // Undo overwrite if incoming last_release_date is older
     if (ctx._source.containsKey("last_release_date")) {
         ZonedDateTime incoming_last_release_date =
             getDate(ctx, "last_release_date");
         // returns -1, 0 or 1
         int difference = incoming_last_release_date.compareTo(last_release_date);
         if(difference < 0){
             ctx._source.last_release_date = last_release_date;
         }
     }
     """
 )
 
 
 def _sanitize_origin(origin):
     origin = origin.copy()
 
     # Whitelist fields to be saved in Elasticsearch
     res = {"url": origin.pop("url")}
     for field_name in (
         "blocklisted",
         "has_visits",
         "intrinsic_metadata",
         "visit_types",
         "nb_visits",
         "snapshot_id",
         "last_visit_date",
         "last_eventful_visit_date",
         "last_revision_date",
         "last_release_date",
     ):
         if field_name in origin:
             res[field_name] = origin.pop(field_name)
 
     # Run the JSON-LD expansion algorithm
     # <https://www.w3.org/TR/json-ld-api/#expansion>
     # to normalize the Codemeta metadata.
     # This is required as Elasticsearch will needs each field to have a consistent
     # type across documents to be searchable; and non-expanded JSON-LD documents
     # can have various types in the same field. For example, all these are
     # equivalent in JSON-LD:
     # * {"author": "Jane Doe"}
     # * {"author": ["Jane Doe"]}
     # * {"author": {"@value": "Jane Doe"}}
     # * {"author": [{"@value": "Jane Doe"}]}
     # and JSON-LD expansion will convert them all to the last one.
     if "intrinsic_metadata" in res:
         intrinsic_metadata = res["intrinsic_metadata"]
         for date_field in ["dateCreated", "dateModified", "datePublished"]:
             if date_field in intrinsic_metadata:
                 date = intrinsic_metadata[date_field]
 
                 # If date{Created,Modified,Published} value isn't parsable
                 # It gets rejected and isn't stored (unlike other fields)
                 formatted_date = parse_and_format_date(date)
                 if formatted_date is None:
                     intrinsic_metadata.pop(date_field)
                 else:
                     intrinsic_metadata[date_field] = formatted_date
 
         res["intrinsic_metadata"] = codemeta.expand(intrinsic_metadata)
 
     return res
 
 
 def token_encode(index_to_tokenize: Dict[bytes, Any]) -> str:
     """Tokenize as string an index page result from a search"""
     page_token = base64.b64encode(msgpack.dumps(index_to_tokenize))
     return page_token.decode()
 
 
 def token_decode(page_token: str) -> Dict[bytes, Any]:
     """Read the page_token"""
     return msgpack.loads(base64.b64decode(page_token.encode()), raw=True)
 
 
 class ElasticSearch:
     def __init__(self, hosts: List[str], indexes: Dict[str, Dict[str, str]] = {}):
         self._backend = Elasticsearch(hosts=hosts)
         self._translator = Translator()
 
         # Merge current configuration with default values
         origin_config = indexes.get("origin", {})
         self.origin_config = {**ORIGIN_DEFAULT_CONFIG, **origin_config}
 
     def _get_origin_index(self) -> str:
         return self.origin_config[INDEX_NAME_PARAM]
 
     def _get_origin_read_alias(self) -> str:
         return self.origin_config[READ_ALIAS_PARAM]
 
     def _get_origin_write_alias(self) -> str:
         return self.origin_config[WRITE_ALIAS_PARAM]
 
     @timed
     def check(self):
         return self._backend.ping()
 
     def deinitialize(self) -> None:
         """Removes all indices from the Elasticsearch backend"""
         self._backend.indices.delete(index="*")
 
     def initialize(self) -> None:
         """Declare Elasticsearch indices, aliases and mappings"""
 
         if not self._backend.indices.exists(index=self._get_origin_index()):
             self._backend.indices.create(index=self._get_origin_index())
 
         if not self._backend.indices.exists_alias(name=self._get_origin_read_alias()):
             self._backend.indices.put_alias(
                 index=self._get_origin_index(), name=self._get_origin_read_alias()
             )
 
         if not self._backend.indices.exists_alias(name=self._get_origin_write_alias()):
             self._backend.indices.put_alias(
                 index=self._get_origin_index(), name=self._get_origin_write_alias()
             )
 
         self._backend.indices.put_mapping(
             index=self._get_origin_index(), body=ORIGIN_MAPPING
         )
 
     @timed
     def flush(self) -> None:
         self._backend.indices.refresh(index=self._get_origin_write_alias())
 
     @timed
     def origin_update(self, documents: Iterable[OriginDict]) -> None:
         write_index = self._get_origin_write_alias()
         documents = map(_sanitize_origin, documents)
         documents_with_sha1 = (
             (hash_to_hex(model.Origin(url=document["url"]).id), document)
             for document in documents
         )
 
         actions = [
             {
                 "_op_type": "update",
                 "_id": sha1,
                 "_index": write_index,
                 "scripted_upsert": True,
                 "upsert": {
-                    **document,
+                    **cast(dict, document),
                     "sha1": sha1,
                 },
                 "retry_on_conflict": 10,
                 "script": {
                     "source": ORIGIN_UPDATE_SCRIPT,
                     "lang": "painless",
                     "params": document,
                 },
             }
             for (sha1, document) in documents_with_sha1
         ]
 
         indexed_count, errors = helpers.bulk(self._backend, actions, index=write_index)
         assert isinstance(errors, List)  # Make mypy happy
 
         send_metric("document:index", count=indexed_count, method_name="origin_update")
         send_metric(
             "document:index_error", count=len(errors), method_name="origin_update"
         )
 
     @timed
     def origin_search(
         self,
         *,
         query: str = "",
         url_pattern: Optional[str] = None,
         metadata_pattern: Optional[str] = None,
         with_visit: bool = False,
         visit_types: Optional[List[str]] = None,
         min_nb_visits: int = 0,
         min_last_visit_date: str = "",
         min_last_eventful_visit_date: str = "",
         min_last_revision_date: str = "",
         min_last_release_date: str = "",
         min_date_created: str = "",
         min_date_modified: str = "",
         min_date_published: str = "",
         programming_languages: Optional[List[str]] = None,
         licenses: Optional[List[str]] = None,
         keywords: Optional[List[str]] = None,
         sort_by: Optional[List[str]] = None,
         page_token: Optional[str] = None,
         limit: int = 50,
     ) -> PagedResult[MinimalOriginDict]:
         query_clauses: List[Dict[str, Any]] = []
 
         query_filters = []
         if url_pattern:
             query_filters.append(f"origin : {escape(url_pattern)}")
 
         if metadata_pattern:
             query_filters.append(f"metadata : {escape(metadata_pattern)}")
 
         # if not query_clauses:
         #     raise ValueError(
         #         "At least one of url_pattern and metadata_pattern must be provided."
         #     )
 
         if with_visit:
             query_filters.append(f"visited = {'true' if with_visit else 'false'}")
         if min_nb_visits:
             query_filters.append(f"visits >= {min_nb_visits}")
         if min_last_visit_date:
             query_filters.append(
                 f"last_visit >= {min_last_visit_date.replace('Z', '+00:00')}"
             )
         if min_last_eventful_visit_date:
             query_filters.append(
                 "last_eventful_visit >= "
                 f"{min_last_eventful_visit_date.replace('Z', '+00:00')}"
             )
         if min_last_revision_date:
             query_filters.append(
                 f"last_revision >= {min_last_revision_date.replace('Z', '+00:00')}"
             )
         if min_last_release_date:
             query_filters.append(
                 f"last_release >= {min_last_release_date.replace('Z', '+00:00')}"
             )
         if keywords:
             query_filters.append(f"keyword in {escape(keywords)}")
         if licenses:
             query_filters.append(f"license in {escape(licenses)}")
 
         if programming_languages:
             query_filters.append(f"language in {escape(programming_languages)}")
 
         if min_date_created:
             query_filters.append(
                 f"created >= {min_date_created.replace('Z', '+00:00')}"
             )
         if min_date_modified:
             query_filters.append(
                 f"modified >= {min_date_modified.replace('Z', '+00:00')}"
             )
         if min_date_published:
             query_filters.append(
                 f"published >= {min_date_published.replace('Z', '+00:00')}"
             )
 
         if visit_types is not None:
             query_filters.append(f"visit_type = {escape(visit_types)}")
 
         combined_filters = " and ".join(query_filters)
         if combined_filters and query:
             query = f"{combined_filters} and {query}"
         else:
             query = combined_filters or query
         parsed_query = self._translator.parse_query(query)
         query_clauses.append(parsed_query["filters"])
 
         field_map = {
             "visits": "nb_visits",
             "last_visit": "last_visit_date",
             "last_eventful_visit": "last_eventful_visit_date",
             "last_revision": "last_revision_date",
             "last_release": "last_release_date",
             "created": "date_created",
             "modified": "date_modified",
             "published": "date_published",
         }
 
         if "sortBy" in parsed_query:
             if sort_by is None:
                 sort_by = []
             for sort_by_option in parsed_query["sortBy"]:
                 if sort_by_option[0] == "-":
                     sort_by.append("-" + field_map[sort_by_option[1:]])
                 else:
                     sort_by.append(field_map[sort_by_option])
         if parsed_query.get("limit", 0):
             limit = parsed_query["limit"]
 
         sorting_params: List[Dict[str, Any]] = []
 
         if sort_by:
             for field in sort_by:
                 order = "asc"
                 if field and field[0] == "-":
                     field = field[1:]
                     order = "desc"
 
                 if field in ["date_created", "date_modified", "date_published"]:
                     sorting_params.append(
                         {
                             get_expansion(field, "."): {
                                 "nested_path": "intrinsic_metadata",
                                 "order": order,
                             }
                         }
                     )
                 elif field in SORT_BY_OPTIONS:
                     sorting_params.append({field: order})
 
         sorting_params.extend(
             [
                 {"_score": "desc"},
                 {"sha1": "asc"},
             ]
         )
 
         body = {
             "query": {
                 "bool": {
                     "must": query_clauses,
                     "must_not": [{"term": {"blocklisted": True}}],
                 }
             },
             "sort": sorting_params,
         }
 
         if page_token:
             # TODO: use ElasticSearch's scroll API?
             page_token_content = token_decode(page_token)
             body["search_after"] = [
                 page_token_content[b"score"],
                 page_token_content[b"sha1"].decode("ascii"),
             ]
 
         if logger.isEnabledFor(logging.DEBUG):
             formatted_body = pprint.pformat(body)
             logger.debug("Search query body: %s", formatted_body)
 
         res = self._backend.search(
             index=self._get_origin_read_alias(), body=body, size=limit
         )
 
         hits = res["hits"]["hits"]
 
         next_page_token: Optional[str] = None
 
         if len(hits) == limit:
             # There are more results after this page; return a pagination token
             # to get them in a future query
             last_hit = hits[-1]
             next_page_token_content = {
                 b"score": last_hit["_score"],
                 b"sha1": last_hit["_source"]["sha1"],
             }
             next_page_token = token_encode(next_page_token_content)
 
         assert len(hits) <= limit
 
         return PagedResult(
             results=[{"url": hit["_source"]["url"]} for hit in hits],
             next_page_token=next_page_token,
         )
 
     def visit_types_count(self) -> Counter:
         body = {
             "aggs": {
                 "not_blocklisted": {
                     "filter": {"bool": {"must_not": [{"term": {"blocklisted": True}}]}},
                     "aggs": {
                         "visit_types": {"terms": {"field": "visit_types", "size": 1000}}
                     },
                 }
             }
         }
 
         res = self._backend.search(
             index=self._get_origin_read_alias(), body=body, size=0
         )
 
         buckets = (
             res.get("aggregations", {})
             .get("not_blocklisted", {})
             .get("visit_types", {})
             .get("buckets", [])
         )
         return Counter({bucket["key"]: bucket["doc_count"] for bucket in buckets})
diff --git a/swh/search/tests/test_search.py b/swh/search/tests/test_search.py
index 9535100..a4803ce 100644
--- a/swh/search/tests/test_search.py
+++ b/swh/search/tests/test_search.py
@@ -1,1286 +1,1290 @@
 # Copyright (C) 2019-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 from collections import Counter
 from datetime import datetime, timedelta, timezone
 from itertools import permutations
 
 from hypothesis import given, settings, strategies
 import pytest
 
 from swh.core.api.classes import stream_results
 
 
 class CommonSearchTest:
     def test_origin_url_unique_word_prefix(self):
         origin_foobar_baz = {"url": "http://foobar.baz"}
         origin_barbaz_qux = {"url": "http://barbaz.qux"}
         origin_qux_quux = {"url": "http://qux.quux"}
         origins = [origin_foobar_baz, origin_barbaz_qux, origin_qux_quux]
 
         self.search.origin_update(origins)
         self.search.flush()
 
         actual_page = self.search.origin_search(url_pattern="foobar")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin_foobar_baz]
 
         actual_page = self.search.origin_search(url_pattern="barb")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin_barbaz_qux]
 
         # 'bar' is part of 'foobar', but is not the beginning of it
         actual_page = self.search.origin_search(url_pattern="bar")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin_barbaz_qux]
 
         actual_page = self.search.origin_search(url_pattern="barbaz")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin_barbaz_qux]
 
     def test_origin_url_unique_word_prefix_multiple_results(self):
         origin_foobar_baz = {"url": "http://foobar.baz"}
         origin_barbaz_qux = {"url": "http://barbaz.qux"}
         origin_qux_quux = {"url": "http://qux.quux"}
 
         self.search.origin_update(
             [origin_foobar_baz, origin_barbaz_qux, origin_qux_quux]
         )
         self.search.flush()
 
         actual_page = self.search.origin_search(url_pattern="qu")
         assert actual_page.next_page_token is None
         results = [r["url"] for r in actual_page.results]
         expected_results = [o["url"] for o in [origin_qux_quux, origin_barbaz_qux]]
         assert sorted(results) == sorted(expected_results)
 
         actual_page = self.search.origin_search(url_pattern="qux")
         assert actual_page.next_page_token is None
         results = [r["url"] for r in actual_page.results]
         expected_results = [o["url"] for o in [origin_qux_quux, origin_barbaz_qux]]
         assert sorted(results) == sorted(expected_results)
 
     def test_origin_url_all_terms(self):
         origin_foo_bar_baz = {"url": "http://foo.bar/baz"}
         origin_foo_bar_foo_bar = {"url": "http://foo.bar/foo.bar"}
         origins = [origin_foo_bar_baz, origin_foo_bar_foo_bar]
 
         self.search.origin_update(origins)
         self.search.flush()
 
         # Only results containing all terms should be returned.
         actual_page = self.search.origin_search(url_pattern="foo bar baz")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin_foo_bar_baz]
 
     def test_origin_with_visit(self):
         origin_foobar_baz = {"url": "http://foobar/baz"}
 
         self.search.origin_update(
             [{**o, "has_visits": True} for o in [origin_foobar_baz]]
         )
         self.search.flush()
 
         actual_page = self.search.origin_search(url_pattern="foobar", with_visit=True)
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin_foobar_baz]
 
     def test_origin_with_visit_added(self):
         origin_foobar_baz = {"url": "http://foobar.baz"}
 
         self.search.origin_update([origin_foobar_baz])
         self.search.flush()
 
         actual_page = self.search.origin_search(url_pattern="foobar", with_visit=True)
         assert actual_page.next_page_token is None
         assert actual_page.results == []
 
         self.search.origin_update(
             [{**o, "has_visits": True} for o in [origin_foobar_baz]]
         )
         self.search.flush()
 
         actual_page = self.search.origin_search(url_pattern="foobar", with_visit=True)
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin_foobar_baz]
 
     def test_origin_no_visit_types_search(self):
         origins = [{"url": "http://foobar.baz"}]
 
         self.search.origin_update(origins)
         self.search.flush()
 
         actual_page = self.search.origin_search(url_pattern="http", visit_types=["git"])
         assert actual_page.next_page_token is None
         results = [r["url"] for r in actual_page.results]
         expected_results = []
         assert sorted(results) == sorted(expected_results)
 
         actual_page = self.search.origin_search(url_pattern="http", visit_types=None)
         assert actual_page.next_page_token is None
         results = [r["url"] for r in actual_page.results]
         expected_results = [origin["url"] for origin in origins]
         assert sorted(results) == sorted(expected_results)
 
     def test_origin_visit_types_search(self):
         origins = [
             {"url": "http://foobar.baz", "visit_types": ["git"]},
             {"url": "http://barbaz.qux", "visit_types": ["svn"]},
             {"url": "http://qux.quux", "visit_types": ["hg"]},
         ]
 
         self.search.origin_update(origins)
         self.search.flush()
 
         for origin in origins:
             actual_page = self.search.origin_search(
                 url_pattern="http", visit_types=origin["visit_types"]
             )
             assert actual_page.next_page_token is None
             results = [r["url"] for r in actual_page.results]
             expected_results = [origin["url"]]
             assert sorted(results) == sorted(expected_results)
 
         actual_page = self.search.origin_search(url_pattern="http", visit_types=None)
         assert actual_page.next_page_token is None
         results = [r["url"] for r in actual_page.results]
         expected_results = [origin["url"] for origin in origins]
         assert sorted(results) == sorted(expected_results)
 
     def test_origin_visit_types_update_search(self):
         origin_url = "http://foobar.baz"
         self.search.origin_update([{"url": origin_url}])
         self.search.flush()
 
         def _add_visit_type(visit_type):
             self.search.origin_update(
                 [{"url": origin_url, "visit_types": [visit_type]}]
             )
             self.search.flush()
 
         def _check_visit_types(visit_types_list):
             for visit_types in visit_types_list:
                 actual_page = self.search.origin_search(
                     url_pattern="http", visit_types=visit_types
                 )
                 assert actual_page.next_page_token is None
                 results = [r["url"] for r in actual_page.results]
                 expected_results = [origin_url]
                 assert sorted(results) == sorted(expected_results)
 
         _add_visit_type("git")
         _check_visit_types([["git"], ["git", "hg"]])
 
         _add_visit_type("svn")
         _check_visit_types([["git"], ["svn"], ["svn", "git"], ["git", "hg", "svn"]])
 
         _add_visit_type("hg")
         _check_visit_types(
             [
                 ["git"],
                 ["svn"],
                 ["hg"],
                 ["svn", "git"],
                 ["hg", "git"],
                 ["hg", "svn"],
                 ["git", "hg", "svn"],
             ]
         )
 
     def test_origin_nb_visits_update_search(self):
         origin_url = "http://foobar.baz"
         self.search.origin_update([{"url": origin_url}])
         self.search.flush()
 
         def _update_nb_visits(nb_visits):
             self.search.origin_update([{"url": origin_url, "nb_visits": nb_visits}])
             self.search.flush()
 
         def _check_min_nb_visits(min_nb_visits):
             actual_page = self.search.origin_search(
                 url_pattern=origin_url,
                 min_nb_visits=min_nb_visits,
             )
             assert actual_page.next_page_token is None
             results = [r["url"] for r in actual_page.results]
             expected_results = [origin_url]
             assert sorted(results) == sorted(expected_results)
 
         _update_nb_visits(2)
         _check_min_nb_visits(2)  # Works for = 2
         _check_min_nb_visits(1)  # Works for < 2
 
         with pytest.raises(AssertionError):
             _check_min_nb_visits(
                 5
             )  # No results for nb_visits >= 5 (should throw error)
 
         _update_nb_visits(5)
         _check_min_nb_visits(5)  # Works for = 5
         _check_min_nb_visits(3)  # Works for < 5
 
     def test_origin_last_visit_date_update_search(self):
         origin_url = "http://foobar.baz"
         self.search.origin_update([{"url": origin_url}])
         self.search.flush()
 
         def _update_last_visit_date(last_visit_date):
             self.search.origin_update(
                 [{"url": origin_url, "last_visit_date": last_visit_date}]
             )
             self.search.flush()
 
         def _check_min_last_visit_date(min_last_visit_date):
             actual_page = self.search.origin_search(
                 url_pattern=origin_url,
                 min_last_visit_date=min_last_visit_date,
             )
             assert actual_page.next_page_token is None
             results = [r["url"] for r in actual_page.results]
             expected_results = [origin_url]
             assert sorted(results) == sorted(expected_results)
 
         now = datetime.now(tz=timezone.utc).isoformat()
         now_minus_5_hours = (
             datetime.now(tz=timezone.utc) - timedelta(hours=5)
         ).isoformat()
         now_plus_5_hours = (
             datetime.now(tz=timezone.utc) + timedelta(hours=5)
         ).isoformat()
 
         _update_last_visit_date(now)
 
         _check_min_last_visit_date(now)  # Works for =
         _check_min_last_visit_date(now_minus_5_hours)  # Works for <
         with pytest.raises(AssertionError):
             _check_min_last_visit_date(now_plus_5_hours)  # Fails for >
 
         _update_last_visit_date(now_plus_5_hours)
 
         _check_min_last_visit_date(now_plus_5_hours)  # Works for =
         _check_min_last_visit_date(now)  # Works for <
 
     def test_journal_client_origin_visit_status_permutation(self):
         NOW = datetime.now(tz=timezone.utc).isoformat()
         NOW_MINUS_5_HOURS = (
             datetime.now(tz=timezone.utc) - timedelta(hours=5)
         ).isoformat()
         NOW_PLUS_5_HOURS = (
             datetime.now(tz=timezone.utc) + timedelta(hours=5)
         ).isoformat()
 
         VISIT_STATUSES = [
             {
                 "url": "http://foobar.baz",
                 "snapshot_id": "SNAPSHOT_1",
                 "last_eventful_visit_date": NOW,
             },
             {
                 "url": "http://foobar.baz",
                 "snapshot_id": "SNAPSHOT_1",
                 "last_eventful_visit_date": NOW_MINUS_5_HOURS,
             },
             {
                 "url": "http://foobar.baz",
                 "snapshot_id": "SNAPSHOT_2",
                 "last_eventful_visit_date": NOW_PLUS_5_HOURS,
             },
         ]
 
         for visit_statuses in permutations(VISIT_STATUSES, len(VISIT_STATUSES)):
             self.search.origin_update(visit_statuses)
             self.search.flush()
             origin_url = "http://foobar.baz"
             actual_page = self.search.origin_search(
                 url_pattern=origin_url,
                 min_last_eventful_visit_date=NOW_PLUS_5_HOURS,
             )
             assert actual_page.next_page_token is None
             results = [r["url"] for r in actual_page.results]
             expected_results = [origin_url]
             assert sorted(results) == sorted(expected_results)
 
             self.reset()
 
     def test_origin_last_eventful_visit_date_update_search(self):
         origin_url = "http://foobar.baz"
         self.search.origin_update([{"url": origin_url}])
         self.search.flush()
 
         def _update_last_eventful_visit_date(snapshot_id, last_eventful_visit_date):
             self.search.origin_update(
                 [
                     {
                         "url": origin_url,
                         "snapshot_id": snapshot_id,
                         "last_eventful_visit_date": last_eventful_visit_date,
                     }
                 ]
             )
             self.search.flush()
 
         def _check_min_last_eventful_visit_date(min_last_eventful_visit_date):
             actual_page = self.search.origin_search(
                 url_pattern=origin_url,
                 min_last_eventful_visit_date=min_last_eventful_visit_date,
             )
             assert actual_page.next_page_token is None
             results = [r["url"] for r in actual_page.results]
             expected_results = [origin_url]
             assert sorted(results) == sorted(expected_results)
 
         now = datetime.now(tz=timezone.utc).isoformat()
         now_minus_5_hours = (
             datetime.now(tz=timezone.utc) - timedelta(hours=5)
         ).isoformat()
         now_plus_5_hours = (
             datetime.now(tz=timezone.utc) + timedelta(hours=5)
         ).isoformat()
 
         snapshot_1 = "SNAPSHOT_1"
         snapshot_2 = "SNAPSHOT_2"
 
         _update_last_eventful_visit_date(snapshot_1, now)
 
         _check_min_last_eventful_visit_date(now)  # Works for =
         _check_min_last_eventful_visit_date(now_minus_5_hours)  # Works for <
         with pytest.raises(AssertionError):
             _check_min_last_eventful_visit_date(now_plus_5_hours)  # Fails for >
 
         _update_last_eventful_visit_date(
             snapshot_1, now_plus_5_hours
         )  # Revisit(not eventful) same origin
 
         _check_min_last_eventful_visit_date(
             now
         )  # Should remain the same because recent visit wasn't eventful
         with pytest.raises(AssertionError):
             _check_min_last_eventful_visit_date(now_plus_5_hours)
 
         _update_last_eventful_visit_date(
             snapshot_2, now_plus_5_hours
         )  # Revisit(eventful) same origin
         _check_min_last_eventful_visit_date(now_plus_5_hours)  # Works for =
         _check_min_last_eventful_visit_date(now)  # Works for <
 
     def _test_origin_last_revision_release_date_update_search(self, date_type):
         origin_url = "http://foobar.baz"
         self.search.origin_update([{"url": origin_url}])
         self.search.flush()
 
         def _update_last_revision_release_date(date):
             self.search.origin_update(
                 [
                     {
                         "url": origin_url,
                         date_type: date,
                     }
                 ]
             )
             self.search.flush()
 
         def _check_min_last_revision_release_date(date):
             actual_page = self.search.origin_search(
                 url_pattern=origin_url,
                 **{f"min_{date_type}": date},
             )
             assert actual_page.next_page_token is None
             results = [r["url"] for r in actual_page.results]
             expected_results = [origin_url]
             assert sorted(results) == sorted(expected_results)
 
         now = datetime.now(tz=timezone.utc).isoformat()
         now_minus_5_hours = (
             datetime.now(tz=timezone.utc) - timedelta(hours=5)
         ).isoformat()
         now_plus_5_hours = (
             datetime.now(tz=timezone.utc) + timedelta(hours=5)
         ).isoformat()
 
         _update_last_revision_release_date(now)
 
         _check_min_last_revision_release_date(now)
         _check_min_last_revision_release_date(now_minus_5_hours)
         with pytest.raises(AssertionError):
             _check_min_last_revision_release_date(now_plus_5_hours)
 
         _update_last_revision_release_date(now_plus_5_hours)
 
         _check_min_last_revision_release_date(now_plus_5_hours)
         _check_min_last_revision_release_date(now)
 
     def test_origin_last_revision_date_update_search(self):
         self._test_origin_last_revision_release_date_update_search(
             date_type="last_revision_date"
         )
 
     def test_origin_last_release_date_update_search(self):
         self._test_origin_last_revision_release_date_update_search(
             date_type="last_revision_date"
         )
 
     def test_origin_instrinsic_metadata_dates_filter_sorting_search(self):
 
         DATE_0 = "1999-06-28"
         DATE_1 = "2001-02-13"
         DATE_2 = "2005-10-02"
 
         ORIGINS = [
             {
                 "url": "http://foobar.0.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "dateCreated": DATE_0,
                     "dateModified": DATE_1,
                     "datePublished": DATE_2,
                 },
             },
             {
                 "url": "http://foobar.1.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "dateCreated": DATE_1,
                     "dateModified": DATE_2,
                     "datePublished": DATE_2,
                 },
             },
             {
                 "url": "http://foobar.2.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "dateCreated": DATE_2,
                     "dateModified": DATE_2,
                     "datePublished": DATE_2,
                 },
             },
         ]
         self.search.origin_update(ORIGINS)
         self.search.flush()
 
         def _check_results(origin_indices, sort_results=True, **kwargs):
             page = self.search.origin_search(url_pattern="foobar", **kwargs)
             results = [r["url"] for r in page.results]
             if sort_results:
                 assert sorted(results) == sorted(
                     [ORIGINS[index]["url"] for index in origin_indices]
                 )
             else:
                 assert results == [ORIGINS[index]["url"] for index in origin_indices]
 
         _check_results(min_date_created=DATE_0, origin_indices=[0, 1, 2])
         _check_results(min_date_created=DATE_1, origin_indices=[1, 2])
         _check_results(min_date_created=DATE_2, origin_indices=[2])
 
         _check_results(min_date_modified=DATE_0, origin_indices=[0, 1, 2])
         _check_results(min_date_modified=DATE_1, origin_indices=[0, 1, 2])
         _check_results(min_date_modified=DATE_2, origin_indices=[1, 2])
 
         _check_results(min_date_published=DATE_0, origin_indices=[0, 1, 2])
         _check_results(min_date_published=DATE_1, origin_indices=[0, 1, 2])
         _check_results(min_date_published=DATE_2, origin_indices=[0, 1, 2])
 
         # Sorting
         _check_results(
             sort_by=["-date_created"], origin_indices=[2, 1, 0], sort_results=False
         )
         _check_results(
             sort_by=["date_created"], origin_indices=[0, 1, 2], sort_results=False
         )
 
     def test_origin_instrinsic_metadata_dates_processing(self):
 
         DATE_0 = "foo"  # will be discarded
         DATE_1 = "2001-2-13"  # will be formatted to 2001-02-13
         DATE_2 = "2005-10-2"  # will be formatted to 2005-10-02
 
         ORIGINS = [
             {
                 "url": "http://foobar.0.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "dateCreated": DATE_0,
                     "dateModified": DATE_1,
                     "datePublished": DATE_2,
                 },
             },
             {
                 "url": "http://foobar.1.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "dateCreated": DATE_1,
                     "dateModified": DATE_2,
                     "datePublished": DATE_2,
                 },
             },
             {
                 "url": "http://foobar.2.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "dateCreated": DATE_2,
                     "dateModified": DATE_2,
                     "datePublished": DATE_2,
                 },
             },
         ]
         self.search.origin_update(ORIGINS)
         self.search.flush()
 
         # check origins have been successfully processed
         page = self.search.origin_search(url_pattern="foobar")
         assert {r["url"] for r in page.results} == {
             "http://foobar.0.com",
             "http://foobar.2.com",
             "http://foobar.1.com",
         }
 
     def test_origin_keywords_search(self):
         ORIGINS = [
             {
                 "url": "http://foobar.1.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "description": "Django is a backend framework for applications",
                     "keywords": "django,backend,server,web,framework",
                 },
             },
             {
                 "url": "http://foobar.2.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "description": "Native Android applications are fast",
                     "keywords": "android,mobile,ui",
                 },
             },
             {
                 "url": "http://foobar.3.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "description": "React framework helps you build web applications",
                     "keywords": "react,web,ui",
                 },
             },
         ]
         self.search.origin_update(ORIGINS)
         self.search.flush()
 
         def _check_results(keywords, origin_indices, sorting=False):
             page = self.search.origin_search(url_pattern="foobar", keywords=keywords)
             results = [r["url"] for r in page.results]
             if sorting:
                 assert sorted(results) == sorted(
                     [ORIGINS[index]["url"] for index in origin_indices]
                 )
             else:
                 assert results == [ORIGINS[index]["url"] for index in origin_indices]
 
         _check_results(["build"], [2])
 
         _check_results(["web"], [2, 0])
         _check_results(["ui"], [1, 2])
 
         # Following tests ensure that boosts work properly
 
         # Baseline: "applications" is common in all origin descriptions
         _check_results(["applications"], [1, 0, 2], True)
 
         # ORIGINS[0] has 'framework' in: keyword + description
         # ORIGINS[2] has 'framework' in: description
         # ORIGINS[1] has 'framework' in: None
         _check_results(["framework", "applications"], [0, 2, 1])
 
         # ORIGINS[1] has 'ui' in: keyword
         # ORIGINS[1] has 'ui' in: keyword
         # ORIGINS[0] has 'ui' in: None
         _check_results(["applications", "ui"], [1, 2, 0])
 
         # ORIGINS[2] has 'web' in: keyword + description
         # ORIGINS[0] has 'web' in: keyword
         # ORIGINS[1] has 'web' in: None
         _check_results(["web", "applications"], [2, 0, 1])
 
     def test_origin_sort_by_search(self):
 
         now = datetime.now(tz=timezone.utc).isoformat()
         now_minus_5_hours = (
             datetime.now(tz=timezone.utc) - timedelta(hours=5)
         ).isoformat()
         now_plus_5_hours = (
             datetime.now(tz=timezone.utc) + timedelta(hours=5)
         ).isoformat()
 
         ORIGINS = [
             {
                 "url": "http://foobar.1.com",
                 "nb_visits": 1,
                 "last_visit_date": now_minus_5_hours,
             },
             {
                 "url": "http://foobar.2.com",
                 "nb_visits": 2,
                 "last_visit_date": now,
             },
             {
                 "url": "http://foobar.3.com",
                 "nb_visits": 3,
                 "last_visit_date": now_plus_5_hours,
             },
         ]
         self.search.origin_update(ORIGINS)
         self.search.flush()
 
         def _check_results(sort_by, origins):
             page = self.search.origin_search(url_pattern="foobar", sort_by=sort_by)
             results = [r["url"] for r in page.results]
             assert results == [origin["url"] for origin in origins]
 
         _check_results(["nb_visits"], ORIGINS)
         _check_results(["-nb_visits"], ORIGINS[::-1])
 
         _check_results(["last_visit_date"], ORIGINS)
         _check_results(["-last_visit_date"], ORIGINS[::-1])
 
         _check_results(["nb_visits", "-last_visit_date"], ORIGINS)
         _check_results(["-last_visit_date", "nb_visits"], ORIGINS[::-1])
 
     def test_origin_instrinsic_metadata_license_search(self):
         ORIGINS = [
             {
                 "url": "http://foobar.1.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "description": "foo bar",
                     "license": "https://spdx.org/licenses/MIT",
                 },
             },
             {
                 "url": "http://foobar.2.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "description": "foo bar",
                     "license": "BSD-3-Clause",
                 },
             },
         ]
         self.search.origin_update(ORIGINS)
         self.search.flush()
 
         def _check_results(licenses, origin_indices):
             page = self.search.origin_search(url_pattern="foobar", licenses=licenses)
             results = [r["url"] for r in page.results]
             assert sorted(results) == sorted(
                 [ORIGINS[i]["url"] for i in origin_indices]
             )
 
         _check_results(["MIT"], [0])
         _check_results(["bsd"], [1])
         _check_results(["mit", "3-Clause"], [0, 1])
 
     def test_origin_instrinsic_metadata_programming_language_search(self):
         ORIGINS = [
             {
                 "url": "http://foobar.1.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "description": "foo bar",
                     "programmingLanguage": "python",
                 },
             },
             {
                 "url": "http://foobar.2.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "description": "foo bar",
                     "programmingLanguage": "javascript",
                 },
             },
         ]
         self.search.origin_update(ORIGINS)
         self.search.flush()
 
         def _check_results(programming_languages, origin_indices):
             page = self.search.origin_search(
                 url_pattern="foobar", programming_languages=programming_languages
             )
             results = [r["url"] for r in page.results]
             assert sorted(results) == sorted(
                 [ORIGINS[i]["url"] for i in origin_indices]
             )
 
         _check_results(["python"], [0])
         _check_results(["javascript"], [1])
         _check_results(["python", "javascript"], [0, 1])
 
     def test_origin_instrinsic_metadata_multiple_field_search(self):
         ORIGINS = [
             {
                 "url": "http://foobar.1.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "description": "foo bar 1",
                     "programmingLanguage": "python",
                     "license": "https://spdx.org/licenses/MIT",
                 },
             },
             {
                 "url": "http://foobar.2.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "description": "foo bar 2",
                     "programmingLanguage": ["javascript", "html", "css"],
                     "license": [
                         "https://spdx.org/licenses/CC-BY-1.0",
                         "https://spdx.org/licenses/Apache-1.0",
                     ],
                 },
             },
             {
                 "url": "http://foobar.3.com",
                 "intrinsic_metadata": {
                     "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                     "description": "foo bar 3",
                     "programmingLanguage": ["Cpp", "c"],
                     "license": "https://spdx.org/licenses/LGPL-2.0-only",
                 },
             },
         ]
         self.search.origin_update(ORIGINS)
         self.search.flush()
 
         def _check_result(programming_languages, licenses, origin_indices):
             page = self.search.origin_search(
                 url_pattern="foobar",
                 programming_languages=programming_languages,
                 licenses=licenses,
             )
             results = [r["url"] for r in page.results]
             assert sorted(results) == sorted(
                 [ORIGINS[i]["url"] for i in origin_indices]
             )
 
         _check_result(["javascript"], ["CC"], [1])
         _check_result(["css"], ["CC"], [1])
         _check_result(["css"], ["CC", "apache"], [1])
 
         _check_result(["python", "javascript"], ["MIT"], [0])
 
         _check_result(["c", "python"], ["LGPL", "mit"], [2, 0])
 
     def test_origin_update_with_no_visit_types(self):
         """
         Update an origin with visit types first then with no visit types,
         check origin can still be searched with visit types afterwards.
         """
         origin_url = "http://foobar.baz"
         self.search.origin_update([{"url": origin_url, "visit_types": ["git"]}])
         self.search.flush()
 
         self.search.origin_update([{"url": origin_url}])
         self.search.flush()
 
         actual_page = self.search.origin_search(url_pattern="http", visit_types=["git"])
         assert actual_page.next_page_token is None
         results = [r["url"] for r in actual_page.results]
         expected_results = [origin_url]
         assert results == expected_results
 
     def test_origin_intrinsic_metadata_description(self):
         origin1_nothin = {"url": "http://origin1"}
         origin2_foobar = {"url": "http://origin2"}
         origin3_barbaz = {"url": "http://origin3"}
 
         self.search.origin_update(
             [
                 {
                     **origin1_nothin,
                     "intrinsic_metadata": {},
                 },
                 {
                     **origin2_foobar,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "description": "foo bar",
                     },
                 },
                 {
                     **origin3_barbaz,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "description": "bar baz",
                     },
                 },
             ]
         )
         self.search.flush()
 
         actual_page = self.search.origin_search(metadata_pattern="foo")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin2_foobar]
 
         actual_page = self.search.origin_search(metadata_pattern="foo bar")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin2_foobar]
 
         actual_page = self.search.origin_search(metadata_pattern="bar baz")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin3_barbaz]
 
     def test_origin_intrinsic_metadata_all_terms(self):
         origin1_foobarfoobar = {"url": "http://origin1"}
         origin3_foobarbaz = {"url": "http://origin2"}
 
         self.search.origin_update(
             [
                 {
                     **origin1_foobarfoobar,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "description": "foo bar foo bar",
                     },
                 },
                 {
                     **origin3_foobarbaz,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "description": "foo bar baz",
                     },
                 },
             ]
         )
         self.search.flush()
 
         actual_page = self.search.origin_search(metadata_pattern="foo bar baz")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin3_foobarbaz]
 
     def test_origin_intrinsic_metadata_long_description(self):
         """Checks ElasticSearch does not try to store large values untokenize,
         which would be inefficient and crash it with:
 
         Document contains at least one immense term in field="intrinsic_metadata.http://schema.org/description.@value" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.
         """  # noqa
         origin1 = {"url": "http://origin1"}
 
         self.search.origin_update(
             [
                 {
                     **origin1,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "description": " ".join(f"foo{i}" for i in range(100000)),
                     },
                 },
             ]
         )
         self.search.flush()
 
         actual_page = self.search.origin_search(metadata_pattern="foo42")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin1]
 
     def test_origin_intrinsic_metadata_matches_cross_fields(self):
         """Checks the backend finds results even if the two words in the query are
         each in a different field."""
         origin1 = {"url": "http://origin1"}
 
         self.search.origin_update(
             [
                 {
                     **origin1,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "description": "foo bar",
                         "author": "John Doe",
                     },
                 },
             ]
         )
         self.search.flush()
 
         actual_page = self.search.origin_search(metadata_pattern="foo John")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin1]
 
     def test_origin_intrinsic_metadata_nested(self):
         origin1_nothin = {"url": "http://origin1"}
         origin2_foobar = {"url": "http://origin2"}
         origin3_barbaz = {"url": "http://origin3"}
 
         self.search.origin_update(
             [
                 {
                     **origin1_nothin,
                     "intrinsic_metadata": {},
                 },
                 {
                     **origin2_foobar,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "keywords": ["foo", "bar"],
                     },
                 },
                 {
                     **origin3_barbaz,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "keywords": ["bar", "baz"],
                     },
                 },
             ]
         )
         self.search.flush()
 
         actual_page = self.search.origin_search(metadata_pattern="foo")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin2_foobar]
 
         actual_page = self.search.origin_search(metadata_pattern="foo bar")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin2_foobar]
 
         actual_page = self.search.origin_search(metadata_pattern="bar baz")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin3_barbaz]
 
     def test_origin_intrinsic_metadata_inconsistent_type(self):
         """Checks the same field can have a concrete value, an object, or an array
         in different documents."""
         origin1_foobar = {"url": "http://origin1"}
         origin2_barbaz = {"url": "http://origin2"}
         origin3_bazqux = {"url": "http://origin3"}
 
         self.search.origin_update(
             [
                 {
                     **origin1_foobar,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "author": {
                             "familyName": "Foo",
                             "givenName": "Bar",
                         },
                     },
                 },
             ]
         )
         self.search.flush()
         self.search.origin_update(
             [
                 {
                     **origin2_barbaz,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "author": "Bar Baz",
                     },
                 },
                 {
                     **origin3_bazqux,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "author": ["Baz", "Qux"],
                     },
                 },
             ]
         )
         self.search.flush()
 
         actual_page = self.search.origin_search(metadata_pattern="bar")
         assert actual_page.next_page_token is None
         results = [r["url"] for r in actual_page.results]
         expected_results = [o["url"] for o in [origin2_barbaz, origin1_foobar]]
         assert sorted(results) == sorted(expected_results)
 
         actual_page = self.search.origin_search(metadata_pattern="baz")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin2_barbaz, origin3_bazqux]
 
         actual_page = self.search.origin_search(metadata_pattern="foo")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin1_foobar]
 
         actual_page = self.search.origin_search(metadata_pattern="bar baz")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin2_barbaz]
 
         actual_page = self.search.origin_search(metadata_pattern="qux")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin3_bazqux]
 
         actual_page = self.search.origin_search(metadata_pattern="baz qux")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin3_bazqux]
 
         actual_page = self.search.origin_search(metadata_pattern="foo bar")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin1_foobar]
 
     def test_origin_intrinsic_metadata_string_mapping(self):
         """Checks inserting a date-like in a field does not update the mapping to
         require every document uses a date in that field; or that search queries
         use a date either.
         Likewise for numeric and boolean fields."""
         origin1 = {"url": "http://origin1"}
         origin2 = {"url": "http://origin2"}
 
         self.search.origin_update(
             [
                 {
                     **origin1,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "dateCreated": "2021-02-18T10:16:52",
-                        "version": "1.0",
+                        "version": 1.0,
+                        "softwareVersion": "1.0",
                         "isAccessibleForFree": True,
+                        "copyrightYear": 2022,
                     },
                 }
             ]
         )
         self.search.flush()
         self.search.origin_update(
             [
                 {
                     **origin2,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "dateCreated": "a long time ago",
                         "address": "in a galaxy far, far away",
                         "version": "a new hope",
+                        "softwareVersion": "a new hope",
                         "isAccessibleForFree": "it depends",
+                        "copyrightYear": "foo bar",
                     },
                 },
             ]
         )
         self.search.flush()
 
         actual_page = self.search.origin_search(metadata_pattern="1.0")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin1]
 
         actual_page = self.search.origin_search(metadata_pattern="long")
         assert actual_page.next_page_token is None
         assert (
             actual_page.results == []
         )  # "%Y-%m-%d" not followed, so value is rejected
 
         actual_page = self.search.origin_search(metadata_pattern="true")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin1]
 
         actual_page = self.search.origin_search(metadata_pattern="it depends")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin2]
 
     def test_origin_intrinsic_metadata_update(self):
         origin = {"url": "http://origin1"}
         origin_data = {
             **origin,
             "intrinsic_metadata": {
                 "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                 "author": "John Doe",
             },
         }
 
         self.search.origin_update([origin_data])
         self.search.flush()
 
         actual_page = self.search.origin_search(metadata_pattern="John")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin]
 
         origin_data["intrinsic_metadata"]["author"] = "Jane Doe"
 
         self.search.origin_update([origin_data])
         self.search.flush()
 
         actual_page = self.search.origin_search(metadata_pattern="Jane")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin]
 
     # TODO: add more tests with more codemeta terms
 
     # TODO: add more tests with edge cases
 
     @settings(deadline=None)
     @given(strategies.integers(min_value=1, max_value=4))
     def test_origin_url_paging(self, limit):
         # TODO: no hypothesis
         origin1_foo = {"url": "http://origin1/foo"}
         origin2_foobar = {"url": "http://origin2/foo/bar"}
         origin3_foobarbaz = {"url": "http://origin3/foo/bar/baz"}
 
         self.reset()
         self.search.origin_update([origin1_foo, origin2_foobar, origin3_foobarbaz])
         self.search.flush()
 
         results = stream_results(
             self.search.origin_search, url_pattern="foo bar baz", limit=limit
         )
         results = [res["url"] for res in results]
         expected_results = [o["url"] for o in [origin3_foobarbaz]]
         assert sorted(results[0 : len(expected_results)]) == sorted(expected_results)
 
         results = stream_results(
             self.search.origin_search, url_pattern="foo bar", limit=limit
         )
         results = [res["url"] for res in results]
         expected_results = [o["url"] for o in [origin2_foobar, origin3_foobarbaz]]
         assert sorted(results[0 : len(expected_results)]) == sorted(expected_results)
 
         results = stream_results(
             self.search.origin_search, url_pattern="foo", limit=limit
         )
         results = [res["url"] for res in results]
         expected_results = [
             o["url"] for o in [origin1_foo, origin2_foobar, origin3_foobarbaz]
         ]
         assert sorted(results[0 : len(expected_results)]) == sorted(expected_results)
 
     @settings(deadline=None)
     @given(strategies.integers(min_value=1, max_value=4))
     def test_origin_intrinsic_metadata_paging(self, limit):
         # TODO: no hypothesis
         origin1_foo = {"url": "http://origin1"}
         origin2_foobar = {"url": "http://origin2"}
         origin3_foobarbaz = {"url": "http://origin3"}
 
         self.reset()
         self.search.origin_update(
             [
                 {
                     **origin1_foo,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "keywords": ["foo"],
                     },
                 },
                 {
                     **origin2_foobar,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "keywords": ["foo", "bar"],
                     },
                 },
                 {
                     **origin3_foobarbaz,
                     "intrinsic_metadata": {
                         "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
                         "keywords": ["foo", "bar", "baz"],
                     },
                 },
             ]
         )
         self.search.flush()
 
         results = stream_results(
             self.search.origin_search, metadata_pattern="foo bar baz", limit=limit
         )
         assert list(results) == [origin3_foobarbaz]
 
         results = stream_results(
             self.search.origin_search, metadata_pattern="foo bar", limit=limit
         )
         assert list(results) == [origin2_foobar, origin3_foobarbaz]
 
         results = stream_results(
             self.search.origin_search, metadata_pattern="foo", limit=limit
         )
         assert list(results) == [origin1_foo, origin2_foobar, origin3_foobarbaz]
 
     def test_search_blocklisted_results(self):
         origin1 = {"url": "http://origin1"}
         origin2 = {"url": "http://origin2", "blocklisted": True}
 
         self.search.origin_update([origin1, origin2])
         self.search.flush()
 
         actual_page = self.search.origin_search(url_pattern="origin")
         assert actual_page.next_page_token is None
         assert actual_page.results == [origin1]
 
     def test_search_blocklisted_update(self):
         origin1 = {"url": "http://origin1"}
         self.search.origin_update([origin1])
         self.search.flush()
 
         result_page = self.search.origin_search(url_pattern="origin")
         assert result_page.next_page_token is None
         assert result_page.results == [origin1]
 
         self.search.origin_update([{**origin1, "blocklisted": True}])
         self.search.flush()
 
         result_page = self.search.origin_search(url_pattern="origin")
         assert result_page.next_page_token is None
         assert result_page.results == []
 
         self.search.origin_update(
             [{**origin1, "has_visits": True, "visit_types": ["git"]}]
         )
         self.search.flush()
 
         result_page = self.search.origin_search(url_pattern="origin")
         assert result_page.next_page_token is None
         assert result_page.results == []
 
     def test_filter_keyword_in_filter(self):
         origin1 = {
             "url": "foo language in ['foo baz'] bar",
         }
         self.search.origin_update([origin1])
         self.search.flush()
 
         result_page = self.search.origin_search(url_pattern="language in ['foo bar']")
         assert result_page.next_page_token is None
         assert result_page.results == [origin1]
 
         result_page = self.search.origin_search(url_pattern="baaz")
         assert result_page.next_page_token is None
         assert result_page.results == []
 
     def test_visit_types_count(self):
         assert self.search.visit_types_count() == Counter()
 
         origins = [
             {"url": "http://foobar.baz", "visit_types": ["git"], "blocklisted": True}
         ]
 
         for idx, visit_type in enumerate(["git", "hg", "svn"]):
             for i in range(idx + 1):
                 origins.append(
                     {
                         "url": f"http://{visit_type}.foobar.baz.{i}",
                         "visit_types": [visit_type],
                     }
                 )
         self.search.origin_update(origins)
         self.search.flush()
 
         assert self.search.visit_types_count() == Counter(git=1, hg=2, svn=3)
 
     def test_origin_search_empty_url_pattern(self):
         origins = [
             {"url": "http://foobar.baz", "visit_types": ["git"]},
             {"url": "http://barbaz.qux", "visit_types": ["svn"]},
             {"url": "http://qux.quux", "visit_types": ["hg"]},
         ]
 
         self.search.origin_update(origins)
         self.search.flush()
 
         # should match all origins
         actual_page = self.search.origin_search(url_pattern="")
         assert actual_page.next_page_token is None
         results = [r["url"] for r in actual_page.results]
         expected_results = [origin["url"] for origin in origins]
         assert sorted(results) == sorted(expected_results)
 
         # should match all origins with visit type
         for origin in origins:
             actual_page = self.search.origin_search(
                 url_pattern="", visit_types=origin["visit_types"]
             )
             assert actual_page.next_page_token is None
             results = [r["url"] for r in actual_page.results]
             expected_results = [origin["url"]]
             assert results == expected_results