Page MenuHomeSoftware Heritage

Mitigate azure objstorage http(s) pipelining issues by backporting urllib3 patch
Closed, MigratedEdits Locked

Description

We've noticed that the azure objstorage sometimes fails to pipeline requests, which makes us spend an unreasonable amount of time doing TLS handshakes with the remote server.

After tracing the HTTP requests made the azure object storage, we've noticed that the connection would be closed and reopened after every time a HEAD request on an object returned an error status.

Googling the behavior yielded https://github.com/boto/botocore/issues/1248, which ended up being https://github.com/urllib3/urllib3/issues/1234

After (manually) applying the patch in https://github.com/urllib3/urllib3/issues/1235, pipelining of requests works as expected, even when HEAD returns a 400 code.

The pattern of alternating HEAD (400) / PUT (201) requests is critical to production use of the object storage, as we always check whether an object exists with a HEAD request before PUTting the new objects. The reconnection penalty was therefore paid for each new object creation.

This task will track the deployment of a patched urllib3 package.