Page MenuHomeSoftware Heritage

TestRemoteObjStorage::test_content_iterator is failing since werkzeug 2.1.0 release
Closed, MigratedEdits Locked

Description

Since the release of werkzeuhg 2.1.0, the TestRemoteObjStorage::test_content_iterator is failing with the error below:

(swh) ✘-1 ~/swh/swh-environment/swh-objstorage [master|✚ 1] 
15:00 $ pytest -sv swh/objstorage/tests/test_objstorage_api.py::TestRemoteObjStorage::test_content_iterator
================================================================================================================================== test session starts ==================================================================================================================================
platform linux -- Python 3.9.2, pytest-7.1.1, pluggy-1.0.0 -- /home/anlambert/.virtualenvs/swh/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/anlambert/swh/swh-environment/swh-objstorage/.hypothesis/examples')
rootdir: /home/anlambert/swh/swh-environment/swh-objstorage, configfile: pytest.ini
plugins: redis-2.4.0, postgresql-3.1.3, forked-1.4.0, hypothesis-6.40.0, mock-3.7.0, flask-1.2.0, xdist-2.5.0, cov-3.0.0, requests-mock-1.9.3, django-test-migrations-1.2.0, django-4.5.2, dash-2.3.1, asyncio-0.18.3, swh.core-2.4.0, swh.journal-1.0.1.dev3+g3771edb
asyncio: mode=legacy
collected 1 item                                                                                                                                                                                                                                                                        

swh/objstorage/tests/test_objstorage_api.py::TestRemoteObjStorage::test_content_iterator FAILED

======================================================================================================================================= FAILURES ========================================================================================================================================
______________________________________________________________________________________________________________________ TestRemoteObjStorage.test_content_iterator _______________________________________________________________________________________________________________________

self = <urllib3.response.HTTPResponse object at 0x7f5da43faa90>

    def _update_chunk_length(self):
        # First, we'll figure out length of a chunk and then
        # we'll try to read it from socket.
        if self.chunk_left is not None:
            return
        line = self._fp.fp.readline()
        line = line.split(b";", 1)[0]
        try:
>           self.chunk_left = int(line, 16)
E           ValueError: invalid literal for int() with base 16: b''

../../../.virtualenvs/swh/lib/python3.9/site-packages/urllib3/response.py:700: ValueError

During handling of the above exception, another exception occurred:

self = <urllib3.response.HTTPResponse object at 0x7f5da43faa90>

    @contextmanager
    def _error_catcher(self):
        """
        Catch low-level python exceptions, instead re-raising urllib3
        variants, so that low-level exceptions are not leaked in the
        high-level api.
    
        On exit, release the connection back to the pool.
        """
        clean_exit = False
    
        try:
            try:
>               yield

../../../.virtualenvs/swh/lib/python3.9/site-packages/urllib3/response.py:441: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <urllib3.response.HTTPResponse object at 0x7f5da43faa90>, amt = 4096, decode_content = True

    def read_chunked(self, amt=None, decode_content=None):
        """
        Similar to :meth:`HTTPResponse.read`, but with an additional
        parameter: ``decode_content``.
    
        :param amt:
            How much of the content to read. If specified, caching is skipped
            because it doesn't make sense to cache partial content as the full
            response.
    
        :param decode_content:
            If True, will attempt to decode the body based on the
            'content-encoding' header.
        """
        self._init_decoder()
        # FIXME: Rewrite this method and make it a class with a better structured logic.
        if not self.chunked:
            raise ResponseNotChunked(
                "Response is not chunked. "
                "Header 'transfer-encoding: chunked' is missing."
            )
        if not self.supports_chunked_reads():
            raise BodyNotHttplibCompatible(
                "Body should be http.client.HTTPResponse like. "
                "It should have have an fp attribute which returns raw chunks."
            )
    
        with self._error_catcher():
            # Don't bother reading the body of a HEAD request.
            if self._original_response and is_response_to_head(self._original_response):
                self._original_response.close()
                return
    
            # If a response is already read and closed
            # then return immediately.
            if self._fp.fp is None:
                return
    
            while True:
>               self._update_chunk_length()

../../../.virtualenvs/swh/lib/python3.9/site-packages/urllib3/response.py:767: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <urllib3.response.HTTPResponse object at 0x7f5da43faa90>

    def _update_chunk_length(self):
        # First, we'll figure out length of a chunk and then
        # we'll try to read it from socket.
        if self.chunk_left is not None:
            return
        line = self._fp.fp.readline()
        line = line.split(b";", 1)[0]
        try:
            self.chunk_left = int(line, 16)
        except ValueError:
            # Invalid chunked protocol response, abort.
            self.close()
>           raise InvalidChunkLength(self, line)
E           urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)

../../../.virtualenvs/swh/lib/python3.9/site-packages/urllib3/response.py:704: InvalidChunkLength

During handling of the above exception, another exception occurred:

    def generate():
        # Special case for urllib3.
        if hasattr(self.raw, 'stream'):
            try:
>               for chunk in self.raw.stream(chunk_size, decode_content=True):

../../../.virtualenvs/swh/lib/python3.9/site-packages/requests/models.py:760: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <urllib3.response.HTTPResponse object at 0x7f5da43faa90>, amt = 4096, decode_content = True

    def stream(self, amt=2 ** 16, decode_content=None):
        """
        A generator wrapper for the read() method. A call will block until
        ``amt`` bytes have been read from the connection or until the
        connection is closed.
    
        :param amt:
            How much of the content to read. The generator will return up to
            much data per iteration, but may return less. This is particularly
            likely when using compressed data. However, the empty string will
            never be returned.
    
        :param decode_content:
            If True, will attempt to decode the body based on the
            'content-encoding' header.
        """
        if self.chunked and self.supports_chunked_reads():
>           for line in self.read_chunked(amt, decode_content=decode_content):

../../../.virtualenvs/swh/lib/python3.9/site-packages/urllib3/response.py:575: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <urllib3.response.HTTPResponse object at 0x7f5da43faa90>, amt = 4096, decode_content = True

    def read_chunked(self, amt=None, decode_content=None):
        """
        Similar to :meth:`HTTPResponse.read`, but with an additional
        parameter: ``decode_content``.
    
        :param amt:
            How much of the content to read. If specified, caching is skipped
            because it doesn't make sense to cache partial content as the full
            response.
    
        :param decode_content:
            If True, will attempt to decode the body based on the
            'content-encoding' header.
        """
        self._init_decoder()
        # FIXME: Rewrite this method and make it a class with a better structured logic.
        if not self.chunked:
            raise ResponseNotChunked(
                "Response is not chunked. "
                "Header 'transfer-encoding: chunked' is missing."
            )
        if not self.supports_chunked_reads():
            raise BodyNotHttplibCompatible(
                "Body should be http.client.HTTPResponse like. "
                "It should have have an fp attribute which returns raw chunks."
            )
    
        with self._error_catcher():
            # Don't bother reading the body of a HEAD request.
            if self._original_response and is_response_to_head(self._original_response):
                self._original_response.close()
                return
    
            # If a response is already read and closed
            # then return immediately.
            if self._fp.fp is None:
                return
    
            while True:
                self._update_chunk_length()
                if self.chunk_left == 0:
                    break
                chunk = self._handle_chunk(amt)
                decoded = self._decode(
                    chunk, decode_content=decode_content, flush_decoder=False
                )
                if decoded:
                    yield decoded
    
            if decode_content:
                # On CPython and PyPy, we should never need to flush the
                # decoder. However, on Jython we *might* need to, so
                # lets defensively do it anyway.
                decoded = self._flush_decoder()
                if decoded:  # Platform-specific: Jython.
                    yield decoded
    
            # Chunk content ends with \r\n: discard it.
            while True:
                line = self._fp.fp.readline()
                if not line:
                    # Some sites may not end with '\r\n'.
                    break
                if line == b"\r\n":
                    break
    
            # We read everything; close the "file".
            if self._original_response:
>               self._original_response.close()

../../../.virtualenvs/swh/lib/python3.9/site-packages/urllib3/response.py:796: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <contextlib._GeneratorContextManager object at 0x7f5da43fa4c0>, type = <class 'urllib3.exceptions.InvalidChunkLength'>, value = InvalidChunkLength(got length b'', 0 bytes read), traceback = <traceback object at 0x7f5da43f0940>

    def __exit__(self, type, value, traceback):
        if type is None:
            try:
                next(self.gen)
            except StopIteration:
                return False
            else:
                raise RuntimeError("generator didn't stop")
        else:
            if value is None:
                # Need to force instantiation so we can reliably
                # tell if we get the same exception back
                value = type()
            try:
>               self.gen.throw(type, value, traceback)

/usr/lib/python3.9/contextlib.py:135: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <urllib3.response.HTTPResponse object at 0x7f5da43faa90>

    @contextmanager
    def _error_catcher(self):
        """
        Catch low-level python exceptions, instead re-raising urllib3
        variants, so that low-level exceptions are not leaked in the
        high-level api.
    
        On exit, release the connection back to the pool.
        """
        clean_exit = False
    
        try:
            try:
                yield
    
            except SocketTimeout:
                # FIXME: Ideally we'd like to include the url in the ReadTimeoutError but
                # there is yet no clean way to get at it from this context.
                raise ReadTimeoutError(self._pool, None, "Read timed out.")
    
            except BaseSSLError as e:
                # FIXME: Is there a better way to differentiate between SSLErrors?
                if "read operation timed out" not in str(e):
                    # SSL errors related to framing/MAC get wrapped and reraised here
                    raise SSLError(e)
    
                raise ReadTimeoutError(self._pool, None, "Read timed out.")
    
            except (HTTPException, SocketError) as e:
                # This includes IncompleteRead.
>               raise ProtocolError("Connection broken: %r" % e, e)
E               urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

../../../.virtualenvs/swh/lib/python3.9/site-packages/urllib3/response.py:458: ProtocolError

During handling of the above exception, another exception occurred:

self = <swh.objstorage.tests.test_objstorage_api.TestRemoteObjStorage testMethod=test_content_iterator>

    def test_content_iterator(self):
        sto_obj_ids = iter(self.storage)
>       sto_obj_ids = list(sto_obj_ids)

swh/objstorage/tests/objstorage_testing.py:229: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
swh/objstorage/api/client.py:48: in __iter__
    yield from self.list_content()
swh/objstorage/api/client.py:54: in list_content
    yield from iter_chunks(
../../../.virtualenvs/swh/lib/python3.9/site-packages/swh/core/utils.py:83: in iter_chunks
    new_data = next(iterator)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    def generate():
        # Special case for urllib3.
        if hasattr(self.raw, 'stream'):
            try:
                for chunk in self.raw.stream(chunk_size, decode_content=True):
                    yield chunk
            except ProtocolError as e:
>               raise ChunkedEncodingError(e)
E               requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

../../../.virtualenvs/swh/lib/python3.9/site-packages/requests/models.py:763: ChunkedEncodingError
=================================================================================================================================== warnings summary ====================================================================================================================================
../../../.virtualenvs/swh/lib/python3.9/site-packages/pytest_asyncio/plugin.py:191
  /home/anlambert/.virtualenvs/swh/lib/python3.9/site-packages/pytest_asyncio/plugin.py:191: DeprecationWarning: The 'asyncio_mode' default value will change to 'strict' in future, please explicitly use 'asyncio_mode=strict' or 'asyncio_mode=auto' in pytest configuration file.
    config.issue_config_time_warning(LEGACY_MODE, stacklevel=2)

swh/objstorage/tests/test_objstorage_api.py::TestRemoteObjStorage::test_content_iterator
  /home/anlambert/swh/swh-environment/swh-objstorage/swh/objstorage/factory.py:92: DeprecationWarning: Explicit "args" key is deprecated for objstorage initialization, use class arguments keys directly instead.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================================================ short test summary info ================================================================================================================================
FAILED swh/objstorage/tests/test_objstorage_api.py::TestRemoteObjStorage::test_content_iterator - requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

I managed to identify the commit that introduced the regression using git bisect.

The main difference between previous werkzeug release (2.0.3) and the latest one (2.1.0) is that the HTTP protocol version used by our RPC servers was bumped from 1.0 to 1.1.