Page MenuHomeSoftware Heritage

tests/elasticsearch: Catch painless script errors and pretty print them
ClosedPublic

Authored by anlambert on Jun 10 2021, 7:48 PM.

Details

Summary

Debugging update painless script is hard as errors are returned in a
not really readable JSON format.

To gain debuggig time, wrap search.origin_update calls when running
elasticsearch tests in order to catch painless script errors and
pretty print them.

Tests will also immediatly fail when such errors are detected.

Below are the kind of pytest reports we can obtain (I intentionnally
added errors in the script code):

_____________________________________________________________________________ TestElasticsearchSearch.test_search_blocklisted_update _____________________________________________________________________________

self = <swh.search.tests.test_elasticsearch.TestElasticsearchSearch testMethod=test_search_blocklisted_update>

    def test_search_blocklisted_update(self):
        origin1 = {"url": "http://origin1"}
>       self.search.origin_update([origin1])

swh/search/tests/test_search.py:595: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <swh.search.elasticsearch.ElasticSearch object at 0x7f8337865780>, args = ([{'url': 'http://origin1'}],), kwargs = {}, script_error = True
error_detail = 'Painless update script failed (compile error).\nerror type: illegal_argument_exception\nerror reason: invalid declara...nscript stack:\n\n... it_types field value\nLst visit_types = ctx._source ...\n                             ^---- HERE'
error = {'caused_by': {'reason': 'invalid declaration: cannot resolve type [Lst]', 'type': 'illegal_argument_exception'}, 'lang': 'painless', 'position': {'end': 72, 'offset': 47, 'start': 22}, 'reason': 'compile error', ...}
@py_assert2 = False, @py_assert1 = False, @py_format4 = 'True is False'
@py_format6 = 'Painless update script failed (compile error).\n~error type: illegal_argument_exception\n~error reason: invalid decla...types field value\n~Lst visit_types = ctx._source ...\n~                             ^---- HERE\n>assert True is False'

    def _origin_update(self, *args, **kwargs):
        script_error = False
        error_detail = ""
        try:
            origin_update(*args, **kwargs)
        except BulkIndexError as e:
            error = e.errors[0].get("update", {}).get("error", {}).get("caused_by")
            if error and "script_stack" in error:
                script_error = True
                error_detail = (
                    f"Painless update script failed ({error['reason']}).\n"
                )
                error_detail += f"error type: {error['caused_by']['type']}\n"
                error_detail += f"error reason: {error['caused_by']['reason']}\n"
                error_detail += "script stack:\n\n" + "\n".join(
                    error["script_stack"]
                )
            else:
                raise e
>       assert script_error is False, error_detail
E       AssertionError: Painless update script failed (compile error).
E         error type: illegal_argument_exception
E         error reason: invalid declaration: cannot resolve type [Lst]
E         script stack:
E         
E         ... it_types field value
E         Lst visit_types = ctx._source ...
E                                      ^---- HERE
E       assert True is False

swh/search/tests/test_elasticsearch.py:47: AssertionError
_________________________________________________________________________ TestElasticsearchSearch.test_origin_visit_types_update_search __________________________________________________________________________

self = <swh.search.tests.test_elasticsearch.TestElasticsearchSearch testMethod=test_origin_visit_types_update_search>

    def test_origin_visit_types_update_search(self):
        origin_url = "http://foobar.baz"
        self.search.origin_update([{"url": origin_url}])
        self.search.flush()
    
        def _add_visit_type(visit_type):
            self.search.origin_update(
                [{"url": origin_url, "visit_types": [visit_type]}]
            )
            self.search.flush()
    
        def _check_visit_types(visit_types_list):
            for visit_types in visit_types_list:
                actual_page = self.search.origin_search(
                    url_pattern="http", visit_types=visit_types
                )
                assert actual_page.next_page_token is None
                results = [r["url"] for r in actual_page.results]
                expected_results = [origin_url]
                assert sorted(results) == sorted(expected_results)
    
        _add_visit_type("git")
        _check_visit_types([["git"], ["git", "hg"]])
    
>       _add_visit_type("svn")

swh/search/tests/test_search.py:171: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
swh/search/tests/test_search.py:154: in _add_visit_type
    [{"url": origin_url, "visit_types": [visit_type]}]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <swh.search.elasticsearch.ElasticSearch object at 0x7f81232aeeb8>, args = ([{'url': 'http://foobar.baz', 'visit_types': ['svn']}],), kwargs = {}, script_error = True
error_detail = 'Painless update script failed (runtime error).\nerror type: illegal_argument_exception\nerror reason: dynamic method ...ource.visit_typescontains(visit_types[i])) {\n            \n                                                ^---- HERE'
error = {'caused_by': {'reason': 'dynamic method [java.util.LinkedHashMap, visit_typescontains/1] not found', 'type': 'illegal...exception'}, 'lang': 'painless', 'position': {'end': 425, 'offset': 405, 'start': 357}, 'reason': 'runtime error', ...}
@py_assert2 = False, @py_assert1 = False, @py_format4 = 'True is False'
@py_format6 = 'Painless update script failed (runtime error).\n~error type: illegal_argument_exception\n~error reason: dynamic metho...(visit_types[i])) {\n~            \n~                                                ^---- HERE\n>assert True is False'

    def _origin_update(self, *args, **kwargs):
        script_error = False
        error_detail = ""
        try:
            origin_update(*args, **kwargs)
        except BulkIndexError as e:
            error = e.errors[0].get("update", {}).get("error", {}).get("caused_by")
            if error and "script_stack" in error:
                script_error = True
                error_detail = (
                    f"Painless update script failed ({error['reason']}).\n"
                )
                error_detail += f"error type: {error['caused_by']['type']}\n"
                error_detail += f"error reason: {error['caused_by']['reason']}\n"
                error_detail += "script stack:\n\n" + "\n".join(
                    error["script_stack"]
                )
            else:
                raise e
>       assert script_error is False, error_detail
E       AssertionError: Painless update script failed (runtime error).
E         error type: illegal_argument_exception
E         error reason: dynamic method [java.util.LinkedHashMap, visit_typescontains/1] not found
E         script stack:
E         
E         if (!ctx._source.visit_typescontains(visit_types[i])) {
E                     
E                                                         ^---- HERE
E       assert True is False

swh/search/tests/test_elasticsearch.py:47: AssertionError

Diff Detail

Repository
rDSEA Archive search
Branch
tests-catch-and-print-painless-errors
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 21929
Build 34106: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 34105: arc lint + arc unit

Unit TestsFailed

TimeTest
14,593 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.search.tests.test_api_client.TestRemoteSearch::test_origin_intrinsic_metadata_all_terms
self = <swh.search.tests.test_api_client.TestRemoteSearch testMethod=test_origin_intrinsic_metadata_all_terms> def test_origin_intrinsic_metadata_all_terms(self):
259 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.search.tests.test_api_client.TestRemoteSearch::test_origin_intrinsic_metadata_date
self = <swh.search.tests.test_api_client.TestRemoteSearch testMethod=test_origin_intrinsic_metadata_date> def test_origin_intrinsic_metadata_date(self):
259 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.search.tests.test_api_client.TestRemoteSearch::test_origin_intrinsic_metadata_description
self = <swh.search.tests.test_api_client.TestRemoteSearch testMethod=test_origin_intrinsic_metadata_description> def test_origin_intrinsic_metadata_description(self):
208 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.search.tests.test_api_client.TestRemoteSearch::test_origin_intrinsic_metadata_inconsistent_type
self = <swh.search.tests.test_api_client.TestRemoteSearch testMethod=test_origin_intrinsic_metadata_inconsistent_type> def test_origin_intrinsic_metadata_inconsistent_type(self):
1,684 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.search.tests.test_api_client.TestRemoteSearch::test_origin_intrinsic_metadata_long_description
self = <swh.search.tests.test_api_client.TestRemoteSearch testMethod=test_origin_intrinsic_metadata_long_description> def test_origin_intrinsic_metadata_long_description(self):
View Full Test Results (46 Failed · 53 Passed · 1 Skipped)

Event Timeline

anlambert added a reviewer: vlorentz.

Build is green

Patch application report for D5853 (id=20936)

Rebasing onto 3e129a3f48...

Current branch diff-target is up to date.
Changes applied before test
commit 453a615f5ad97c82ccf6b802be9ffd12dc1a1014
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Thu Jun 10 19:42:05 2021 +0200

    tests/elasticsearch: Catch painless script errors and pretty print them
    
    Debugging update painless script is hard as errors are returned in a
    not really readable JSON format.
    
    To gain debuggig time, wrap search.origin_update calls when running
    elasticsearch tests in order to catch painless script errors and
    pretty print them.
    
    Tests will also immediatly fail when such errors are detected.

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/125/ for more details.

Nice, thanks.

swh/search/tests/test_elasticsearch.py
37–44

Can you make it a single multiline string?

This revision is now accepted and ready to land.Jun 11 2021, 11:29 AM
swh/search/tests/test_elasticsearch.py
37–44

sure, I will also use dict.get to avoid possible errors if the JSON schema changes in the future.

Use multiline string and use dict.get to prevent errors if JSON schema changes.

Build has FAILED

Patch application report for D5853 (id=20950)

Rebasing onto 3e129a3f48...

Current branch diff-target is up to date.
Changes applied before test
commit afe170c9feab221bf5f6df23948ab3d215d746f5
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Thu Jun 10 19:42:05 2021 +0200

    tests/elasticsearch: Catch painless script errors and pretty print them
    
    Debugging update painless script is hard as errors are returned in a
    not really readable JSON format.
    
    To gain debuggig time, wrap search.origin_update calls when running
    elasticsearch tests in order to catch painless script errors and
    pretty print them.
    
    Tests will also immediatly fail when such errors are detected.

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/128/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/128/console

Forgot to remove bug in painless code

Build is green

Patch application report for D5853 (id=20951)

Rebasing onto 3e129a3f48...

Current branch diff-target is up to date.
Changes applied before test
commit 870aecd97dc60819984edf713ebf62838b812cf2
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Thu Jun 10 19:42:05 2021 +0200

    tests/elasticsearch: Catch painless script errors and pretty print them
    
    Debugging update painless script is hard as errors are returned in a
    not really readable JSON format.
    
    To gain debuggig time, wrap search.origin_update calls when running
    elasticsearch tests in order to catch painless script errors and
    pretty print them.
    
    Tests will also immediatly fail when such errors are detected.

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/129/ for more details.