diff --git a/PKG-INFO b/PKG-INFO index b9d6de0..2c855c5 100644 --- a/PKG-INFO +++ b/PKG-INFO @@ -1,90 +1,90 @@ Metadata-Version: 2.1 Name: swh.search -Version: 0.13.0 +Version: 0.13.1 Summary: Software Heritage search service Home-page: https://forge.softwareheritage.org/diffusion/DSEA Author: Software Heritage developers Author-email: swh-devel@inria.fr License: UNKNOWN Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest Project-URL: Funding, https://www.softwareheritage.org/donate Project-URL: Source, https://forge.softwareheritage.org/source/swh-search Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-search/ Platform: UNKNOWN Classifier: Programming Language :: Python :: 3 Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3) Classifier: Operating System :: OS Independent Classifier: Development Status :: 3 - Alpha Requires-Python: >=3.7 Description-Content-Type: text/markdown Provides-Extra: testing License-File: LICENSE License-File: AUTHORS swh-search ========== Search service for the Software Heritage archive. It is similar to swh-storage in what it contains, but provides different ways to query it: while swh-storage is mostly a key-value store that returns an object from a primary key, swh-search is focused on reverse indices, to allow finding objects that match some criteria; for example full-text search. Currently uses ElasticSearch, and provides only origin search (by URL and metadata) ## Dependencies - Python tests for this module include tests that cannot be run without a local ElasticSearch instance, so you need the ElasticSearch server executable on your machine (no need to have a running ElasticSearch server). - Debian-like host The elasticsearch package is required. As it's not part of debian-stable, [another debian repository is required to be configured](https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html#deb-repo) - Non Debian-like host The tests expect: - `/usr/share/elasticsearch/jdk/bin/java` to exist. - `org.elasticsearch.bootstrap.Elasticsearch` to be in java's classpath. - Emscripten is required for generating tree-sitter WASM module. The following commands need to be executed for the setup: ```bash cd /opt && git clone https://github.com/emscripten-core/emsdk.git && cd emsdk && \ ./emsdk install latest && ./emsdk activate latest PATH="${PATH}:/opt/emsdk/upstream/emscripten" ``` **Note:** If emsdk isn't found in the PATH, the tree-sitter cli automatically pulls `emscripten/emsdk` image from docker hub when `make ts-build-wasm` or `make ts-build` is used. ## Make targets Below is the list of available make targets that can be executed from the root directory of swh-search in order to build and/or execute the swh-search under various configurations: * **ts-install**: Install node_modules and emscripten SDK required for TreeSitter * **ts-generate**: Generate parser files(C and JSON) from the grammar * **ts-repl**: Starts a web based playground for the TreeSitter grammar. It's the recommended way for developing TreeSitter grammar. * **ts-dev**: Parse the `query_language/sample_query` and print the corresponding syntax expression along with the start and end positions of all the nodes. * **ts-dev sanitize=1**: Same as **ts-dev** but without start and end position of the nodes. This format is expected by TreeSitter's native test command. `sanitize=1` cleans the output of **ts-dev** using `sed` to achieve the desired format. * **ts-test**: executes TreeSitter's native tests * **ts-build-so**: Generates `swh_ql.so` file from the previously generated parser using py-tree-sitter * **ts-build-so**: Generates `swh_ql.wasm` file from the previously generated parser using emscripten * **ts-build**: Executes both **ts-build-so** and **ts-build-so** diff --git a/debian/changelog b/debian/changelog index 75957eb..1a53d94 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,373 +1,376 @@ -swh-search (0.13.0-1~swh1~bpo10+1) buster-swh; urgency=medium +swh-search (0.13.1-1~swh1) unstable-swh; urgency=medium - * Rebuild for buster-swh + * New upstream release 0.13.1 - (tagged by Valentin Lorentz + on 2022-03-07 12:49:31 +0100) + * Upstream changes: - v0.13.1 - * docs: Update examples of the + query language - -- Software Heritage autobuilder (on jenkins-debian1) Wed, 16 Feb 2022 14:08:46 +0000 + -- Software Heritage autobuilder (on jenkins-debian1) Mon, 07 Mar 2022 11:54:17 +0000 swh-search (0.13.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.13.0 - (tagged by Valentin Lorentz on 2022-02-16 13:12:20 +0100) * Upstream changes: - v0.13.0 - * Use ':' for substring matching instead of '=' - * translator: Fix 'visited = false' queries to actually return results. - * grammar: Prevent 'isoDateTime' rule from being too greedy -- Software Heritage autobuilder (on jenkins-debian1) Wed, 16 Feb 2022 12:17:38 +0000 swh-search (0.12.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.12.1 - (tagged by Valentin Lorentz on 2022-02-14 15:26:46 +0100) * Upstream changes: - v0.12.1 - * Make RemoteSearch reraise specific exceptions instead of generic RemoteException - * Fix crash when no filter but the main query is given -- Software Heritage autobuilder (on jenkins-debian1) Mon, 14 Feb 2022 14:31:40 +0000 swh-search (0.12.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.12.0 - (tagged by Valentin Lorentz on 2022-01-12 13:55:44 +0100) * Upstream changes: - v0.12.0 - * search: Ensure CodeMeta dates are properly formatted - * setup.py: use yarnpkg instead of yarn if present in PATH - * swh.search.utils: Fix type - * conftest: Fix tests hang since elasticsearch 7.16 release - * Unpin tree-sitter dependency - * tests: Use TimestampWithTimezone.from_datetime() instead of the constructor -- Software Heritage autobuilder (on jenkins-debian1) Wed, 12 Jan 2022 13:00:33 +0000 swh-search (0.11.6-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.6 - (tagged by Antoine Lambert on 2021-09-29 15:47:53 +0200) * Upstream changes: - version 0.11.6 -- Software Heritage autobuilder (on jenkins-debian1) Wed, 29 Sep 2021 13:53:44 +0000 swh-search (0.11.5-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.5 - (tagged by Antoine Lambert on 2021-09-28 17:39:15 +0200) * Upstream changes: - version 0.11.5 -- Software Heritage autobuilder (on jenkins-debian1) Tue, 28 Sep 2021 15:48:00 +0000 swh-search (0.11.4-2~swh1) unstable-swh; urgency=medium * Use --no-ext-rename in dh_python3 to avoid renaming swh_ql.so -- Nicolas Dandrimont Wed, 01 Sep 2021 17:12:49 +0200 swh-search (0.11.4-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.4 - (tagged by Valentin Lorentz on 2021-08-31 15:01:41 +0200) * Upstream changes: - v0.11.4 - * Fix debian build -- Software Heritage autobuilder (on jenkins-debian1) Tue, 31 Aug 2021 13:15:08 +0000 swh-search (0.11.3-3~swh1) unstable-swh; urgency=medium * This package is now architecture-dependent * Make pytest more verbose -- Nicolas Dandrimont Tue, 31 Aug 2021 15:00:42 +0200 swh-search (0.11.3-2~swh1) unstable-swh; urgency=medium * Add python3-tree-sitter build-dependency -- Nicolas Dandrimont Tue, 31 Aug 2021 14:18:43 +0200 swh-search (0.11.3-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.3 - (tagged by Valentin Lorentz on 2021-08-31 14:04:03 +0200) * Upstream changes: - v0.11.3 - * clean up sdist -- Software Heritage autobuilder (on jenkins-debian1) Tue, 31 Aug 2021 12:14:47 +0000 swh-search (0.11.2-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.2 - (tagged by Valentin Lorentz on 2021-08-18 12:02:09 +0200) * Upstream changes: - v0.11.2 - * cli.py: Add rpc-serve command - * grammar.js: Improve grammar and export tokens -- Software Heritage autobuilder (on jenkins-debian1) Wed, 18 Aug 2021 10:07:04 +0000 swh-search (0.11.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.1 - (tagged by Vincent SELLIER on 2021-08-16 18:33:00 +0200) * Upstream changes: - v0.11.1 - fix the tree-sitter dependency management during the pypi build -- Software Heritage autobuilder (on jenkins-debian1) Mon, 16 Aug 2021 16:40:38 +0000 swh-search (0.11.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.0 - (tagged by Valentin Lorentz on 2021-08-09 17:27:33 +0200) * Upstream changes: - v0.11.0 - * Add logging for search terms in debug mode - * journal_client: use origin_visit_status.type instead of origin_visit - * Add query language - * Disable fetch_last_revision_release_date outside tests -- Software Heritage autobuilder (on jenkins-debian1) Fri, 13 Aug 2021 14:42:01 +0000 swh-search (0.10.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.10.0 - (tagged by Nicolas Dandrimont on 2021-07-21 10:35:59 +0200) * Upstream changes: - Release swh.search v0.10.0 -- Software Heritage autobuilder (on jenkins-debian1) Wed, 21 Jul 2021 08:41:27 +0000 swh-search (0.9.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.9.0 - (tagged by Vincent SELLIER on 2021-06-17 16:54:50 +0200) * Upstream changes: - v0.9.0 - Changelog: - * Fix boolean mapping in metadata document - * Store nb_visits and last_visit_date - * test_origin_intrinsic_metadata_long_description: Re-increase description size - * tests/test_search: Use a reasonably long description value - * tests/elasticsearch: Catch painless script errors and pretty print them - * mypy: Fix errors with release >= v0.900 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 17 Jun 2021 15:01:42 +0000 swh-search (0.8.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.8.1 - (tagged by Antoine Lambert on 2021-04-29 14:36:43 +0200) * Upstream changes: - version 0.8.1 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 29 Apr 2021 12:41:23 +0000 swh-search (0.8.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.8.0 - (tagged by Nicolas Dandrimont on 2021-04-08 17:37:41 +0200) * Upstream changes: - Release swh.search 0.8.0 - Implement a blocklist for origin results - Fix docs typesetting -- Software Heritage autobuilder (on jenkins-debian1) Thu, 08 Apr 2021 15:42:22 +0000 swh-search (0.7.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.7.1 - (tagged by Vincent SELLIER on 2021-03-04 15:59:28 +0100) * Upstream changes: - v0.7.1 - Changelog: - * Allow to instantiate the service with default indexes configuration -- Software Heritage autobuilder (on jenkins-debian1) Thu, 04 Mar 2021 15:06:34 +0000 swh-search (0.7.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.7.0 - (tagged by Vincent SELLIER on 2021-03-04 12:09:12 +0100) * Upstream changes: - v0.7.0 - Changelog: - * Ensure the elasticsearch indexes are initialized before the first request - * Use elasticsearch aliases to simplify maintenance operations - * search.cli: Drop unused and untested rpc-serve cli entrypoint - * api.wsgi: Drop unused wsgi module - * Add missing server tests - * Add typing to origin_update's argument and origin_search's return -- Software Heritage autobuilder (on jenkins-debian1) Thu, 04 Mar 2021 11:19:29 +0000 swh-search (0.6.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.6.1 - (tagged by Antoine Lambert on 2021-02-18 18:55:56 +0100) * Upstream changes: - version 0.6.1 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 18 Feb 2021 18:00:51 +0000 swh-search (0.6.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.6.0 - (tagged by Antoine Lambert on 2021-02-18 15:28:07 +0100) * Upstream changes: - version 0.6.0 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 18 Feb 2021 14:31:07 +0000 swh-search (0.5.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.5.0 - (tagged by Vincent SELLIER on 2021-02-18 11:20:43 +0100) * Upstream changes: - v0.5.0 - Add monitoring metrics -- Software Heritage autobuilder (on jenkins-debian1) Thu, 18 Feb 2021 10:25:39 +0000 swh-search (0.4.2-1~swh1) unstable-swh; urgency=medium * New upstream release 0.4.2 - (tagged by Antoine Lambert on 2021-02-17 11:09:21 +0100) * Upstream changes: - version 0.4.2 -- Software Heritage autobuilder (on jenkins-debian1) Wed, 17 Feb 2021 10:14:16 +0000 swh-search (0.4.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.4.1 - (tagged by Vincent SELLIER on 2021-01-07 16:15:23 +0100) * Upstream changes: - v0.4.1 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 07 Jan 2021 15:18:24 +0000 swh-search (0.4.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.4.0 - (tagged by Vincent SELLIER on 2020-12-23 16:37:18 +0100) * Upstream changes: - Support an index name prefix -- Software Heritage autobuilder (on jenkins-debian1) Wed, 23 Dec 2020 15:41:09 +0000 swh-search (0.3.5-1~swh1) unstable-swh; urgency=medium * New upstream release 0.3.5 - (tagged by Valentin Lorentz on 2020-12-22 17:32:26 +0100) * Upstream changes: - v0.3.5 - * Write some basic documentation to describe what swh-search is. - * Add more comments in elasticsearch.py -- Software Heritage autobuilder (on jenkins-debian1) Tue, 22 Dec 2020 16:38:29 +0000 swh-search (0.3.4-1~swh1) unstable-swh; urgency=medium * New upstream release 0.3.4 - (tagged by Antoine R. Dumont (@ardumont) on 2020-12-17 12:13:49 +0100) * Upstream changes: - v0.3.4 - search.journal_client: Actually filter on full origin_visit_status -- Software Heritage autobuilder (on jenkins-debian1) Thu, 17 Dec 2020 11:16:32 +0000 swh-search (0.3.3-1~swh1) unstable-swh; urgency=medium * New upstream release 0.3.3 - (tagged by Antoine R. Dumont (@ardumont) on 2020-12-11 15:20:01 +0100) * Upstream changes: - v0.3.3 - Use cross-field search. - Normalize Codemeta documents by expanding them. - Add test for long descriptions. -- Software Heritage autobuilder (on jenkins-debian1) Fri, 11 Dec 2020 14:22:59 +0000 swh-search (0.3.2-1~swh1) unstable-swh; urgency=medium * New upstream release 0.3.2 - (tagged by Antoine R. Dumont (@ardumont) on 2020-12-10 09:49:35 +0100) * Upstream changes: - v0.3.2 - search.journal_client: Fix key error - test_journal_client: Migrate to pytest -- Software Heritage autobuilder (on jenkins-debian1) Thu, 10 Dec 2020 08:54:53 +0000 swh-search (0.3.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.3.1 - (tagged by Antoine R. Dumont (@ardumont) on 2020-12-09 18:21:33 +0100) * Upstream changes: - v0.3.1 - Allow configuration through cli or config file -- Software Heritage autobuilder (on jenkins-debian1) Wed, 09 Dec 2020 18:53:39 +0000 swh-search (0.3.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.3.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-12-08 11:30:33 +0100) * Upstream changes: - v0.3.0 - cli: Subscribe journal client to origin_intrinsic_metadata topic - cli: Subscribe journal client to origin_visit_status - cli: Allow topic prefix declaration through cli or configuration - cli: Allow object- type declaration through cli or configuration - tox.ini: pin black to the pre-commit version (19.10b0) to avoid flip-flops - Run isort after the CLI import changes -- Software Heritage autobuilder (on jenkins-debian1) Tue, 08 Dec 2020 10:33:30 +0000 swh-search (0.2.3-1~swh1) unstable-swh; urgency=medium * New upstream release 0.2.3 - (tagged by David Douard on 2020-09-25 12:51:11 +0200) * Upstream changes: - v0.2.3 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 25 Sep 2020 10:53:12 +0000 swh-search (0.2.2-1~swh1) unstable-swh; urgency=medium * New upstream release 0.2.2 - (tagged by Antoine R. Dumont (@ardumont) on 2020-08-03 11:58:53 +0200) * Upstream changes: - v0.2.2 - Fix test_cli.invoke for old PyYAML versions (such as 3.13, in Debian 10). -- Software Heritage autobuilder (on jenkins-debian1) Mon, 03 Aug 2020 10:00:05 +0000 swh-search (0.2.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.2.1 - (tagged by Antoine R. Dumont (@ardumont) on 2020-08-03 10:59:31 +0200) * Upstream changes: - v0.2.1 - setup.py: Migrate from vcversioner to setuptools-scm -- Software Heritage autobuilder (on jenkins-debian1) Mon, 03 Aug 2020 09:00:39 +0000 swh-search (0.2.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.2.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-08-03 10:40:39 +0200) * Upstream changes: - v0.2.0 - swh.search: Define an interface for search backends and use it - swh.search.get_search: Simplify instantiation -- Software Heritage autobuilder (on jenkins-debian1) Mon, 03 Aug 2020 08:42:45 +0000 swh-search (0.1.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.1.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-07-31 14:05:22 +0200) * Upstream changes: - v0.1.0 - Type origin_search(...) -> PagedResult[Dict] - README: Update necessary dependencies for test purposes - Fixes on journal updates - Blackify strings - setup: Update the minimum required runtime python3 version -- Software Heritage autobuilder (on jenkins-debian1) Fri, 31 Jul 2020 12:10:22 +0000 swh-search (0.0.4-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.4 - (tagged by Antoine R. Dumont (@ardumont) on 2020-01-23 15:00:50 +0100) * Upstream changes: - v0.0.4 docs: Remove swh-py-template label - Only return results where all terms match. - Don't use refresh='wait_for' when updating origins. - Add a 'sha1' field to origin documents, used for sorting. - Add a pre-commit config file - Migrate tox.ini to extras = xxx instead of deps = .[testing] - De- specify testenv:py3 - Include all requirements in MANIFEST.in -- Software Heritage autobuilder (on jenkins-debian1) Thu, 23 Jan 2020 14:04:17 +0000 swh-search (0.0.3-1~swh2) unstable-swh; urgency=medium * Filter out swh/__init__.py from package -- Nicolas Dandrimont Tue, 14 Jan 2020 16:38:23 +0100 swh-search (0.0.3-1~swh1) unstable-swh; urgency=medium * Initial packaging -- Nicolas Dandrimont Mon, 13 Jan 2020 16:59:11 +0100 diff --git a/docs/query-language.rst b/docs/query-language.rst index 11dba04..b538845 100644 --- a/docs/query-language.rst +++ b/docs/query-language.rst @@ -1,190 +1,199 @@ Search Query Language ===================== Every query is composed of filters separated by ``and`` or ``or``. These filters have 3 components in the order : ``Name Operator Value`` Some of the examples are : * ``origin : plasma and language in [python] and visits >= 5`` - * ``last_revision > 2020-01-01 and limit = 10`` * ``last_visit > 2021-01-01 or last_visit < 2020-01-01`` * ``visited = false and metadata = "kubernetes" or origin : "minikube"`` - * ``keyword in ["orchestration", "kubectl"] and language in ["go", "rust"]`` + * ``keyword in ["orchestration", "kubectl"] and license in ["GPLv3+", "GPLv3"]`` * ``(origin : debian or visit_type = ["deb"]) and license in ["GPL-3"]`` + +.. this one is currently disabled, because it is too expansive to add + the 'last_revision' in swh-search: + + * ``last_revision > 2020-01-01 and limit = 10`` + +.. and this one does not work yet because we don't collect enough language metadata: + + * ``keyword in ["orchestration", "kubectl"] and language in ["go", "rust"]`` + **Note**: * Whitespaces are optional between the three components of a filter. * The conjunction operators have left precedence. Therefore ``foo and bar and baz`` means ``(foo and bar) and baz`` * ``and`` has higher precedence than ``or``. Therefore ``foo or bar and baz`` means ``foo or (bar and baz)`` * Precedence can be overridden using parentheses: ``(`` and ``)``. For example, you can override the default precedence in the previous query as: ``(foo or bar) and baz`` * To actually search for ``and`` or ``or`` as strings, just put them within quotes. Example : ``metadata : "vcs history and metadata"``, or even just ``metadata : "and"`` to search for the string ``and`` in the metadata The filters have been classified based on the type of value that they expects. Pattern filters --------------- Returns origins having the given keywords in their url or intrinsic metadata * Name: * ``origin``: Keywords from the origin url * ``metadata``: Keywords from all the intrinsic metadata fields * Operator: ``:`` * Value: String wrapped in quotation marks(``"`` or ``'``) **Note:** If a string has no whitespace then the quotation marks become optional. **Examples:** * ``origin : https://github.com/Django/django`` * ``origin : kubernetes`` * ``origin : "github python"`` * ``metadata : orchestration`` * ``metadata : "javascript language"`` Boolean filters --------------- Returns origins having their boolean type values equal to given values * Name: ``visited`` : Whether the origin has been visited * Operator: ``=`` * Value: ``true`` or ``false`` **Examples:** * ``visited = true`` * ``visited = false`` Numeric filters --------------- Returns origins having their numeric type values in the given range * Name: ``visits`` : Number of visits of an origin * Operator: ``<`` ``<=`` ``=`` ``!=`` ``>`` ``>=`` * Value: Positive integer **Examples:** * ``visits > 2`` * ``visits = 5`` * ``visits <= 10`` Un-bounded List filters ----------------------- Returns origins that satisfy the criteria based on a given list * Name: * ``language`` : Programming languages used * ``license`` : License used * ``keyword`` : keywords (often same as tags) or description (includes README) from the metadata * Operator: ``in`` ``not in`` * Value: Array of strings **Note:** * If a string has no whitespace then the quotation marks become optional. * The ``keyword`` filter gives more priority to the keywords field of intrinsic metadata than the description field. So origins having the queried term in their intrinsic metadata keyword will appear first. **Examples:** * ``language in [python, js]`` * ``license in ["GPL 3.0 or later", MIT]`` * ``keyword in ["Software Heritage", swh]`` Bounded List filters -------------------- Returns origins that satisfy the criteria based on a list of fixed options **visit_type** * Name: ``visit_type`` : Returns only origins with at least one of the specified visit types * Operator: ``=`` * Value: Array of the following values ``any`` ``cran`` ``deb`` ``deposit`` ``ftp`` ``hg`` ``git`` ``nixguix`` ``npm`` ``pypi`` ``svn`` ``tar`` **sort_by** * Name: ``sort_by`` : Sorts origins based on the given list of origin attributes * Operator: ``=`` * Value: Array of the following values ``visits`` ``last_visit`` ``last_eventful_visit`` ``last_revision`` ``last_release`` ``created`` ``modified`` ``published`` **Examples:** * ``visit_type = [svn, npm]`` * ``visit_type = [nixguix, "ftp"]`` * ``sort_by = ["last_visit", created]`` * ``sort_by = [visits, modified]`` Date filters ------------ Returns origins having their date type values in the given range * Name: * ``last_visit`` : Latest visit date * ``last_eventful_visit`` : Latest visit date where a new snapshot was detected * ``last_revision`` : Latest commit date * ``last_release`` : Latest release date * ``created`` Creation date * ``modified`` Modification date * ``published`` Published date * Operator: ``<`` ``<=`` ``=`` ``!=`` ``>`` ``>=`` * Value: Date in ``Standard ISO`` format **Note:** The last three date filters are based on metadata that has to be manually entered by the repository authors. So they might not be correct or up-to-date. **Examples:** * ``last_visit > 2001-01-01 and last_visit < 2101-01-01`` * ``last_revision = "2000-01-01 18:35Z"`` * ``last_release != "2021-07-17T18:35:00Z"`` * ``created <= "2021-07-17 18:35"`` Limit filter ------------ Limits the number of results to at most N * Name: ``limit`` * Operator: ``=`` * Value: Positive Integer **Note:** The default value of the limit is 50 **Examples:** * ``limit = 1`` * ``limit = 15`` diff --git a/swh.search.egg-info/PKG-INFO b/swh.search.egg-info/PKG-INFO index b9d6de0..2c855c5 100644 --- a/swh.search.egg-info/PKG-INFO +++ b/swh.search.egg-info/PKG-INFO @@ -1,90 +1,90 @@ Metadata-Version: 2.1 Name: swh.search -Version: 0.13.0 +Version: 0.13.1 Summary: Software Heritage search service Home-page: https://forge.softwareheritage.org/diffusion/DSEA Author: Software Heritage developers Author-email: swh-devel@inria.fr License: UNKNOWN Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest Project-URL: Funding, https://www.softwareheritage.org/donate Project-URL: Source, https://forge.softwareheritage.org/source/swh-search Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-search/ Platform: UNKNOWN Classifier: Programming Language :: Python :: 3 Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3) Classifier: Operating System :: OS Independent Classifier: Development Status :: 3 - Alpha Requires-Python: >=3.7 Description-Content-Type: text/markdown Provides-Extra: testing License-File: LICENSE License-File: AUTHORS swh-search ========== Search service for the Software Heritage archive. It is similar to swh-storage in what it contains, but provides different ways to query it: while swh-storage is mostly a key-value store that returns an object from a primary key, swh-search is focused on reverse indices, to allow finding objects that match some criteria; for example full-text search. Currently uses ElasticSearch, and provides only origin search (by URL and metadata) ## Dependencies - Python tests for this module include tests that cannot be run without a local ElasticSearch instance, so you need the ElasticSearch server executable on your machine (no need to have a running ElasticSearch server). - Debian-like host The elasticsearch package is required. As it's not part of debian-stable, [another debian repository is required to be configured](https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html#deb-repo) - Non Debian-like host The tests expect: - `/usr/share/elasticsearch/jdk/bin/java` to exist. - `org.elasticsearch.bootstrap.Elasticsearch` to be in java's classpath. - Emscripten is required for generating tree-sitter WASM module. The following commands need to be executed for the setup: ```bash cd /opt && git clone https://github.com/emscripten-core/emsdk.git && cd emsdk && \ ./emsdk install latest && ./emsdk activate latest PATH="${PATH}:/opt/emsdk/upstream/emscripten" ``` **Note:** If emsdk isn't found in the PATH, the tree-sitter cli automatically pulls `emscripten/emsdk` image from docker hub when `make ts-build-wasm` or `make ts-build` is used. ## Make targets Below is the list of available make targets that can be executed from the root directory of swh-search in order to build and/or execute the swh-search under various configurations: * **ts-install**: Install node_modules and emscripten SDK required for TreeSitter * **ts-generate**: Generate parser files(C and JSON) from the grammar * **ts-repl**: Starts a web based playground for the TreeSitter grammar. It's the recommended way for developing TreeSitter grammar. * **ts-dev**: Parse the `query_language/sample_query` and print the corresponding syntax expression along with the start and end positions of all the nodes. * **ts-dev sanitize=1**: Same as **ts-dev** but without start and end position of the nodes. This format is expected by TreeSitter's native test command. `sanitize=1` cleans the output of **ts-dev** using `sed` to achieve the desired format. * **ts-test**: executes TreeSitter's native tests * **ts-build-so**: Generates `swh_ql.so` file from the previously generated parser using py-tree-sitter * **ts-build-so**: Generates `swh_ql.wasm` file from the previously generated parser using emscripten * **ts-build**: Executes both **ts-build-so** and **ts-build-so** diff --git a/swh.search.egg-info/entry_points.txt b/swh.search.egg-info/entry_points.txt index 28b8966..2bca812 100644 --- a/swh.search.egg-info/entry_points.txt +++ b/swh.search.egg-info/entry_points.txt @@ -1,4 +1,2 @@ - - [swh.cli.subcommands] - search=swh.search.cli - \ No newline at end of file +[swh.cli.subcommands] +search = swh.search.cli