Page MenuHomeSoftware Heritage

Diagnose swh-environment build failures
Closed, MigratedEdits Locked

Description

Investigate build failures on the swh-docker and swh-environment build in the ci:

  • Several segmentation fault:
15:07:10 Thread 0x00007f8b49570740 (most recent call first):
15:07:10   File "/usr/lib/python3.7/selectors.py", line 415 in select
15:07:10   File "/usr/lib/python3.7/socketserver.py", line 232 in serve_forever
15:07:10   File "/home/jenkins/workspace/DENV/tests/.venv/lib/python3.7/site-packages/werkzeug/serving.py", line 777 in serve_forever
...
15:07:10   File "/usr/lib/python3.7/multiprocessing/context.py", line 277 in _Popen
15:07:10   File "/usr/lib/python3.7/multiprocessing/context.py", line 223 in _Popen
15:07:10   File "/usr/lib/python3.7/multiprocessing/process.py", line 112 in start
15:07:10   File "/home/jenkins/workspace/DENV/tests/swh-core/swh/core/api/tests/server_testing.py", line 74 in start_server
15:07:10   File "/home/jenkins/workspace/DENV/tests/swh-core/swh/core/api/tests/server_testing.py", line 35 in setUp
15:07:10   File "/home/jenkins/workspace/DENV/tests/swh-search/swh/search/tests/test_api_client.py", line 39 in setUp
...
15:07:10   File "/home/jenkins/workspace/DENV/tests/.venv/lib/python3.7/site-packages/_pytest/config/__init__.py", line 185 in console_main
15:07:10   File "/home/jenkins/workspace/DENV/tests/.venv/lib/python3.7/site-packages/pytest/__main__.py", line 5 in <module>
15:07:10   File "/usr/lib/python3.7/runpy.py", line 85 in _run_code
15:07:10   File "/usr/lib/python3.7/runpy.py", line 193 in _run_module_as_main
15:07:11 F.........s......... [ 20%]
15:07:22                                                                          [ 20%]
15:07:22 swh/search/tests/test_cli.py ..Fatal Python error: Segmentation fault
14:55:36 swh/indexer/tests/test_codemeta.py ..Fatal Python error: Segmentation fault
14:55:36 
14:55:36 Current thread 0x00007fa582b57740 (most recent call first):
14:55:36   File "/home/jenkins/workspace/DENV/tests/.venv/lib/python3.7/site-packages/pyld/jsonld.py", line 6542 in freeze
14:55:36   File "/home/jenkins/workspace/DENV/tests/.venv/lib/python3.7/site-packages/pyld/jsonld.py", line 5530 in _get_initial_context
14:55:36   File "/home/jenkins/workspace/DENV/tests/.venv/lib/python3.7/site-packages/pyld/jsonld.py", line 855 in expand
14:55:36   File "/home/jenkins/workspace/DENV/tests/.venv/lib/python3.7/site-packages/pyld/jsonld.py", line 163 in expand
14:55:36   File "/home/jenkins/workspace/DENV/tests/swh-indexer/swh/indexer/codemeta.py", line 131 in expand
14:55:36   File "/home/jenkins/workspace/DENV/tests/swh-indexer/swh/indexer/codemeta.py", line 173 in merge_documents
14:55:36   File "/home/jenkins/workspace/DENV/tests/swh-indexer/swh/indexer/tests/test_codemeta.py", line 86 in test_merge_documents
...
14:55:36   File "/home/jenkins/workspace/DENV/tests/.venv/lib/python3.7/site-packages/pytest/__main__.py", line 5 in <module>
14:55:36   File "/usr/lib/python3.7/runpy.py", line 85 in _run_code
14:55:36   File "/usr/lib/python3.7/runpy.py", line 193 in _run_module_as_main
14:55:38 make: *** [../Makefile.python:20: test] Segmentation fault (core dumped)
14:55:38 python3 -m pytest  .
  • several deposit failures:
15:00:15 __________________________ test_deposit_loading_ok_2 ___________________________
15:00:15 
15:00:15 swh_storage = <swh.storage.proxies.retry.RetryingProxyStorage object at 0x7f89ac1f0240>
15:00:15 deposit_client = <swh.loader.package.deposit.loader.ApiClient object at 0x7f89ac13de80>
15:00:15 requests_mock_datadir = <requests_mock.mocker.Mocker object at 0x7f89ac13d588>
...
15:00:15         # Retrieve the release
15:00:15         release = loader.storage.release_get([hash_to_bytes(release_id)])[0]
15:00:15         assert release
15:00:15 >       assert release.date.to_dict() == raw_meta["deposit"]["author_date"]
15:00:15 E       AssertionError: assert {'negative_ut...: 1507389428}} == {'negative_ut...: 1507389428}}
15:00:15 E         Omitting 3 identical items, use -vv to show
15:00:15 E         Left contains 1 more item:
15:00:15 E         {'offset_bytes': b'+0000'}
15:00:15 E         Use -v to get the full diff
14:55:03 swh/deposit/tests/api/test_deposit_private_read_metadata.py:337: 
14:55:03 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
14:55:03 ../.venv/lib/python3.7/site-packages/rest_framework/test.py:289: in get
14:55:03     response = super().get(path, data=data, **extra)
14:55:03 ../.venv/lib/python3.7/site-packages/rest_framework/test.py:206: in get
14:55:03     return self.generic('GET', path, **r)
14:55:03 ../.venv/lib/python3.7/site-packages/rest_framework/test.py:235: in generic
14:55:03     method, path, data, content_type, secure, **extra)
14:55:03 ../.venv/lib/python3.7/site-packages/django/test/client.py:422: in generic
...
14:55:03 ../.venv/lib/python3.7/site-packages/rest_framework/views.py:506: in dispatch
14:55:03     response = handler(request, *args, **kwargs)
14:55:03 swh/deposit/api/private/__init__.py:86: in get
14:55:03     return super().get(request, collection_name, deposit_id)
14:55:03 swh/deposit/api/common.py:1093: in get
14:55:03     json.dumps(content), status=status, content_type=content_type
14:55:03 /usr/lib/python3.7/json/__init__.py:231: in dumps
14:55:03     return _default_encoder.encode(obj)
14:55:03 /usr/lib/python3.7/json/encoder.py:199: in encode
14:55:03     chunks = self.iterencode(o, _one_shot=True)
14:55:03 /usr/lib/python3.7/json/encoder.py:257: in iterencode
14:55:03     return _iterencode(o, 0)
...
14:55:03 >       raise TypeError(f'Object of type {o.__class__.__name__} '
14:55:03                         f'is not JSON serializable')
14:55:03 E       TypeError: Object of type bytes is not JSON serializable
14:55:03 
14:55:03 /usr/lib/python3.7/json/encoder.py:179: TypeError

None of these problems occurs on the master builds

Event Timeline

vsellier changed the task status from Open to Work in Progress.Dec 20 2021, 3:23 PM
vsellier created this task.

For the segfault, I suspect an issue due to the OS difference inside the docker container and the host (debian 10 / debian 11)

root@e35f7a024575:/home/jenkins/swh-environment/swh-indexer# gdb python3 core
(gdb) where
#0  raise (sig=11) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  <signal handler called>
#2  0x00007f6548d70d46 in frozendict_new_barebone (type=0x7f6548d800e0 <PyFrozenDict_Type>)
    at /project/frozendict/src/3_7/frozendictobject.c:2214
#3  _frozendict_new (use_empty_frozendict=1, kwds=0x0, args=<optimized out>, type=0x7f6548d800e0 <PyFrozenDict_Type>)
    at /project/frozendict/src/3_7/frozendictobject.c:2255
#4  frozendict_new (type=0x7f6548d800e0 <PyFrozenDict_Type>, args=<optimized out>, kwds=0x0)
    at /project/frozendict/src/3_7/frozendictobject.c:2290
#5  0x00000000005d9bd7 in _PyObject_FastCallKeywords ()
#136 0x000000000065468e in _Py_UnixMain ()
#137 0x00007f654efe109b in __libc_start_main (main=0x4bc560 <main>, argc=9, argv=0x7ffe6f651488, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe6f651478) at ../csu/libc-start.c:308
#138 0x00000000005e0e8a in _start ()
(gdb)

I'm trying to reproduce the problem locally in a vm to check if a workaround can be foud.

(the problem is also present on my laptop (debian 11) when the build is done in a debian 10 container)

edit : The problem is also present on a debian 10 VM so it's not related to the host os

It seems the problem is related to the new version 2.1.2 of the frozendict library released the 18h December.
Pinning the version to the previous 2.1.1 solved the problem

vsellier reopened this task as Work in Progress.Dec 22 2021, 5:07 PM

Thanks for creating the diff and submitting the issue on the frozen dict repo.

The deposit tests are passing without any error but it seems it's now loader tests are now failing with the same kind of error:

from https://jenkins.softwareheritage.org/job/DENV/job/tests/1950/console

16:56:30 swh/loader/package/deposit/tests/test_deposit.py .....F.                 [ 33%]
...
16:56:46 =================================== FAILURES ===================================
16:56:46 __________________________ test_deposit_loading_ok_2 ___________________________
...
16:56:46 >       assert release.date.to_dict() == raw_meta["deposit"]["author_date"]
16:56:46 E       AssertionError: assert {'negative_ut...: 1507389428}} == {'negative_ut...: 1507389428}}
16:56:46 E         Omitting 3 identical items, use -vv to show
16:56:46 E         Left contains 1 more item:
16:56:46 E         {'offset_bytes': b'+0000'}
16:56:46 E         Use -v to get the full diff
16:56:46 
16:56:46 swh/loader/package/deposit/tests/test_deposit.py:335: AssertionError