Page MenuHomeSoftware Heritage

swh api clients often fail with a BadStatusLine exception
Closed, MigratedEdits Locked

Description

The swh backend API clients (swh.storage.api.client and swh.obstorage.api.client) often fail with a BadStatusLine exception from deep in the bowels of urllib3.

This is especially significant under high loads (e.g. when rehashing millions of objects through the objstorage api).

Basically, there is a race condition between the client, which eagerly pipelines requests, and the server, which reaps children every few hundred thousand requests because of a memory leak (T757).

The HTTP implementation of the WSGI runners (gunicorn, uwsgi) is not supposed to be exposed to external clients; We should probably bite the bullet and front them with an actual reverse proxy such as nginx or apache (ugh).

Event Timeline

As a proof of concept nginx has been manually deployed to uffizi on port 15003. It does alleviate the BadStatusLine issues the archiver encountered before under high load. This "just" needs to be properly deployed.

The BadStatusLine error reared back its ugly head on the vault backend. It was reproducible. Putting a nginx in front of the vault backend fixed it. I guess it's time to puppetize the configuration of a http frontend to all our RPC servers.

Here is a pcap of the issue observed while a Vault cooker was calling set_progress on the Vault backend. The BadStatusLine packet that only contains "\r\n" is packet 306.

olasd claimed this task.

So, that was a beast of a puppet refactoring, but the end result is: all our RPC servers are now nicely tucked behind an instance of nginx.