Page MenuHomeSoftware Heritage

Use cross-field search.
ClosedPublic

Authored by vlorentz on Dec 11 2020, 1:56 PM.

Details

Summary

This allows getting results for the query 'foo bar' even if 'foo' and
'bar' are in different fields (eg. author and description) of the same
document.

Diff Detail

Repository
rDSEA Archive search
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D4723 (id=16725)

Could not rebase; Attempt merge onto c7cd341796...

Updating c7cd341..dd8ba4b
Fast-forward
 requirements-swh.txt               |  1 +
 swh/search/elasticsearch.py        | 10 +++++
 swh/search/tests/test_in_memory.py |  8 ++++
 swh/search/tests/test_search.py    | 90 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 109 insertions(+)
Changes applied before test
commit dd8ba4b63053457000b3bb3fd9ae847b9e57b54a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Dec 11 13:56:00 2020 +0100

    Use cross-field search.
    
    This allows getting results for the query 'foo bar' even if 'foo' and
    'bar' are in different fields (eg. author and description) of the same
    document.

commit 4390bea8dbd89016ba87010d785ed55e16d6038c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Dec 11 13:43:20 2020 +0100

    Normalize Codemeta documents by expanding them.
    
    This causes all fields to be an array of objects, instead of being
    a mix of concrete values, arrays, and objects.
    
    ElasticSearch requires this consistency to be able to dynamically
    create mappings.

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/57/ for more details.

This revision is now accepted and ready to land.Dec 11 2020, 2:03 PM
This revision was landed with ongoing or failed builds.Dec 11 2020, 2:14 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D4723 (id=16727)

Could not rebase; Attempt merge onto c7cd341796...

Updating c7cd341..4d860fe
Fast-forward
 requirements-swh.txt               |  1 +
 swh/search/elasticsearch.py        |  4 ++
 swh/search/tests/test_in_memory.py |  8 ++++
 swh/search/tests/test_search.py    | 90 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 103 insertions(+)
Changes applied before test
commit 4d860febb5525b0dbe65a25126e20488a4c4500d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Dec 11 13:56:00 2020 +0100

    Use cross-field search.
    
    This allows getting results for the query 'foo bar' even if 'foo' and
    'bar' are in different fields (eg. author and description) of the same
    document.

commit 407a00e8aa5a69350689a4a02609bf79a0e75ff3
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Dec 11 13:43:20 2020 +0100

    Normalize Codemeta documents by expanding them.
    
    This causes all fields to be an array of objects, instead of being
    a mix of concrete values, arrays, and objects.
    
    ElasticSearch requires this consistency to be able to dynamically
    create mappings.

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/59/ for more details.