Page MenuHomeSoftware Heritage

grammar: Prevent 'isoDateTime' rule from being too greedy
ClosedPublic

Authored by vlorentz on Feb 16 2022, 11:36 AM.

Details

Reviewers
anlambert
Group Reviewers
Reviewers
Maniphest Tasks
Restricted Maniphest Task
Restricted Maniphest Task
Commits
rDSEA4b387b6daf45: grammar: Prevent 'isoDateTime' rule from being too greedy
Summary

When giving a date (with no time), it captures trailing whitespaces,
so the date in queries like 'last_visit >= 2020-01-01 and ...' is
parsed as '2020-01-01 ', which makes ES error.

Resolves T3941.

Diff Detail

Repository
rDSEA Archive search
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vlorentz added a task: Restricted Maniphest Task.

Build is green

Patch application report for D7185 (id=26024)

Could not rebase; Attempt merge onto 4e635b230a...

Updating 4e635b2..4b387b6
Fast-forward
 docs/query-language.rst                            |  18 ++--
 setup.py                                           |  20 +++-
 swh/search/elasticsearch.py                        |   4 +-
 swh/search/query_language/grammar.js               |  11 ++-
 swh/search/query_language/sample_query             |   4 +-
 .../query_language/test/corpus/combinations.txt    |  20 ++--
 swh/search/query_language/tokens.js                |   2 +
 swh/search/tests/test_elasticsearch.py             | 104 +++++++++++++++++++--
 swh/search/tests/test_translator.py                |  45 ++++++++-
 swh/search/translator.py                           |  18 +++-
 10 files changed, 203 insertions(+), 43 deletions(-)
Changes applied before test
commit 4b387b6daf45a48914a2d2f47437992219449892
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Feb 16 11:35:29 2022 +0100

    grammar: Prevent 'isoDateTime' rule from being too greedy
    
    When giving a date (with no time), it captures trailing whitespaces,
    so the date in queries like 'last_visit >= 2020-01-01 and ...' is
    parsed as '2020-01-01 ', which makes ES error.

commit b35df430b7e7edb920dc870469f63449e140a541
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Feb 16 11:06:09 2022 +0100

    translator: Fix 'visited = false' queries to actually return results.
    
    Non-visited origins don't have a 'has_visits' field at all, so comparing
    it to `false` never returns results.

commit 3107cad2bf837cdcc083cf856a1976dbd715559b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Feb 16 10:36:10 2022 +0100

    Use ':' for substring matching instead of '='
    
    I find it very confusing to use '=' for this operation.

commit 3eed4b99a764eeb5b53cc47e92cf4246856e4caa
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Feb 16 10:27:35 2022 +0100

    setup.py: Regenerate parser when sources were changed

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/311/ for more details.

This revision is now accepted and ready to land.Feb 16 2022, 11:53 AM