Page MenuHomeSoftware Heritage

docker/conf/loader: Configure storage with retry proxy
AbandonedPublic

Authored by anlambert on Jan 4 2022, 11:45 AM.

Details

Reviewers
ardumont
Group Reviewers
Reviewers
Summary

While testing the subversion loader on large repositories in
the docker environment, I often encountered storage timeouts
which makes loadings failed.

Configuring the storage with the retry proxy in the swh-loader
service prevents this kind of annoying issues.

Diff Detail

Repository
rDENV Development environment
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 25794
Build 40309: arc lint + arc unit

Event Timeline

What kinds of storage timeouts? I'm not against this on principle, but I'm a bit worried that this could be masking a real bug.

In D6874#178709, @olasd wrote:

What kinds of storage timeouts? I'm not against this on principle, but I'm a bit worried that this could be masking a real bug.

If I recall correctly, the issue occured when adding a large amount of contents. I should be able to reproduce it, will paste the stack trace here.

I'd feel more comfortable also if we have a good (aka documented and understood) reason for doing this.

So I managed to reproduce the issue I encountered, turns out it was not a storage timeout but rather a Connection reset by peer error, see stacktrace below:

swh-loader_1                        | [2022-01-04 21:07:53,446: DEBUG/ForkPoolWorker-1] Flushing 1 objects of type revision (1 parents, 145 estimated bytes)
swh-loader_1                        | [2022-01-04 21:07:53,500: DEBUG/ForkPoolWorker-1] Flushing 2012 objects of type directory (39967 entries)
swh-loader_1                        | [2022-01-04 21:07:55,932: DEBUG/ForkPoolWorker-1] rev: 5232, swhrev: a6c32f5ad6136e57f6b1a02726296fd9b274f337, dir: 3d3432d677755ecb3371a13416b7de5f9c6ffa73
swh-loader_1                        | [2022-01-04 21:07:55,972: DEBUG/ForkPoolWorker-1] Flushing 13093 objects of type content (159073357 bytes)
swh-loader_1                        | [2022-01-04 21:07:56,414: DEBUG/ForkPoolWorker-1] Flushing 1 objects of type revision (1 parents, 143 estimated bytes)
swh-loader_1                        | [2022-01-04 21:07:56,463: DEBUG/ForkPoolWorker-1] Flushing 2012 objects of type directory (39967 entries)
swh-loader_1                        | [2022-01-04 21:07:58,587: DEBUG/ForkPoolWorker-1] rev: 5233, swhrev: c55279cf36f060d8a0c62c21832db07dedb84044, dir: 0e2657f6e50ebc418960e7712a3456b7bc52b65c
swh-loader_1                        | [2022-01-04 21:07:58,627: DEBUG/ForkPoolWorker-1] Flushing 13093 objects of type content (159073290 bytes)
swh-loader_1                        | [2022-01-04 21:07:58,652: ERROR/ForkPoolWorker-1] Loading failure, updating to `failed` status
swh-loader_1                        | Traceback (most recent call last):
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 706, in urlopen
swh-loader_1                        |     chunked=chunked,
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 394, in _make_request
swh-loader_1                        |     conn.request(method, url, **httplib_request_kw)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/connection.py", line 239, in request
swh-loader_1                        |     super(HTTPConnection, self).request(method, url, body=body, headers=headers)
swh-loader_1                        |   File "/usr/local/lib/python3.7/http/client.py", line 1281, in request
swh-loader_1                        |     self._send_request(method, url, body, headers, encode_chunked)
swh-loader_1                        |   File "/usr/local/lib/python3.7/http/client.py", line 1327, in _send_request
swh-loader_1                        |     self.endheaders(body, encode_chunked=encode_chunked)
swh-loader_1                        |   File "/usr/local/lib/python3.7/http/client.py", line 1276, in endheaders
swh-loader_1                        |     self._send_output(message_body, encode_chunked=encode_chunked)
swh-loader_1                        |   File "/usr/local/lib/python3.7/http/client.py", line 1075, in _send_output
swh-loader_1                        |     self.send(chunk)
swh-loader_1                        |   File "/usr/local/lib/python3.7/http/client.py", line 997, in send
swh-loader_1                        |     self.sock.sendall(data)
swh-loader_1                        | ConnectionResetError: [Errno 104] Connection reset by peer
swh-loader_1                        | 
swh-loader_1                        | During handling of the above exception, another exception occurred:
swh-loader_1                        | 
swh-loader_1                        | Traceback (most recent call last):
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/requests/adapters.py", line 450, in send
swh-loader_1                        |     timeout=timeout
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 756, in urlopen
swh-loader_1                        |     method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/util/retry.py", line 532, in increment
swh-loader_1                        |     raise six.reraise(type(error), error, _stacktrace)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/packages/six.py", line 769, in reraise
swh-loader_1                        |     raise value.with_traceback(tb)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 706, in urlopen
swh-loader_1                        |     chunked=chunked,
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 394, in _make_request
swh-loader_1                        |     conn.request(method, url, **httplib_request_kw)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/urllib3/connection.py", line 239, in request
swh-loader_1                        |     super(HTTPConnection, self).request(method, url, body=body, headers=headers)
swh-loader_1                        |   File "/usr/local/lib/python3.7/http/client.py", line 1281, in request
swh-loader_1                        |     self._send_request(method, url, body, headers, encode_chunked)
swh-loader_1                        |   File "/usr/local/lib/python3.7/http/client.py", line 1327, in _send_request
swh-loader_1                        |     self.endheaders(body, encode_chunked=encode_chunked)
swh-loader_1                        |   File "/usr/local/lib/python3.7/http/client.py", line 1276, in endheaders
swh-loader_1                        |     self._send_output(message_body, encode_chunked=encode_chunked)
swh-loader_1                        |   File "/usr/local/lib/python3.7/http/client.py", line 1075, in _send_output
swh-loader_1                        |     self.send(chunk)
swh-loader_1                        |   File "/usr/local/lib/python3.7/http/client.py", line 997, in send
swh-loader_1                        |     self.sock.sendall(data)
swh-loader_1                        | urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
swh-loader_1                        | 
swh-loader_1                        | During handling of the above exception, another exception occurred:
swh-loader_1                        | 
swh-loader_1                        | Traceback (most recent call last):
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 254, in raw_verb
swh-loader_1                        |     return getattr(self.session, verb)(self._url(endpoint), **opts)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/requests/sessions.py", line 577, in post
swh-loader_1                        |     return self.request('POST', url, data=data, json=json, **kwargs)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/requests/sessions.py", line 529, in request
swh-loader_1                        |     resp = self.send(prep, **send_kwargs)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/requests/sessions.py", line 645, in send
swh-loader_1                        |     r = adapter.send(request, **kwargs)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/requests/adapters.py", line 501, in send
swh-loader_1                        |     raise ConnectionError(err, request=request)
swh-loader_1                        | requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
swh-loader_1                        | 
swh-loader_1                        | During handling of the above exception, another exception occurred:
swh-loader_1                        | 
swh-loader_1                        | Traceback (most recent call last):
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 339, in load
swh-loader_1                        |     self.store_data()
swh-loader_1                        |   File "/src/swh-loader-svn/swh/loader/svn/loader.py", line 489, in store_data
swh-loader_1                        |     self.storage.content_add(self._contents)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/buffer.py", line 153, in content_add
swh-loader_1                        |     keys=["sha1", "sha1_git", "sha256", "blake2s256"],
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/buffer.py", line 224, in object_add
swh-loader_1                        |     return self.flush()
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/buffer.py", line 286, in flush
swh-loader_1                        |     stats = add_fn(list(batch))
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 54, in content_add
swh-loader_1                        |     contents_to_add = self._filter_missing_contents(content)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 113, in _filter_missing_contents
swh-loader_1                        |     return set(self.storage.content_missing(missing_contents, key_hash="sha256",))
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 181, in meth_
swh-loader_1                        |     return self.post(meth._endpoint_path, post_data)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 272, in post
swh-loader_1                        |     **opts,
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 256, in raw_verb
swh-loader_1                        |     raise self.api_exception(e)
swh-loader_1                        | swh.storage.exc.StorageAPIError: An unexpected error occurred in the api backend: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
swh-loader_1                        | [2022-01-04 21:07:58,683: DEBUG/ForkPoolWorker-1] Flushing 13093 objects of type content (159073290 bytes)
swh-loader_1                        | [2022-01-04 21:07:59,137: DEBUG/ForkPoolWorker-1] Flushing 1 objects of type revision (1 parents, 164 estimated bytes)
swh-loader_1                        | [2022-01-04 21:07:59,171: ERROR/ForkPoolWorker-1] NOT FOR PRODUCTION - debug flag activated
swh-loader_1                        | Local repository not cleaned up for investigation: /tmp/swh.loader.svn.arfljmvh-128/tmp4uusn8rz
swh-loader_1                        | [2022-01-04 21:07:59,771: INFO/ForkPoolWorker-1] Task swh.loader.svn.tasks.DumpMountAndLoadSvnRepository[4f2645f8-b3bd-4c00-9dbb-6f98f3a610f3] succeeded in 11831.774478806s: {'status': 'failed'}

There is no stack traces in other services so it looks like a docker network issue.

Once the storage retry proxy configured, I could fully load that svn repository.

Can you reproduce it outside Docker?

Can you reproduce it outside Docker?

I did not see such errors in sentry reports for loaders so I guess it is docker specific.

Does reducing content_bytes (e.g. making it 50MB) help? Any logs in the storage container? How about going through nginx instead of hitting gunicorn directly?

@anlambert We use the retry proxy in production, though.

ardumont added a subscriber: ardumont.

@anlambert We use the retry proxy in production, though.

yes, which is something enough to make me accept it.
This aligns docker with prod.

This revision is now accepted and ready to land.Jan 6 2022, 12:21 PM

Sure, but this also hides what is most likely a bug. localhost connections should not fail like this.

Any logs in the storage container?

Unfortunately not.

Does reducing content_bytes (e.g. making it 50MB) help? How about going through nginx instead of hitting gunicorn directly?

I can try but aligning docker conf with production one seems the right way to proceed.

Does reducing content_bytes (e.g. making it 50MB) help? How about going through nginx instead of hitting gunicorn directly?

I can try but aligning docker conf with production one seems the right way to proceed.

All production RPC services are proxied through nginx before hitting gunicorn, hence my asking if doing that changes anything.

@olasd, so I tested the nginx approach. First I configured the storage to use the nginx proxy but I encountered the following error at the objstorage level:

swh-storage_1                       | Traceback (most recent call last):
swh-storage_1                       |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
swh-storage_1                       |     rv = self.dispatch_request()
swh-storage_1                       |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
swh-storage_1                       |     return self.view_functions[rule.endpoint](**req.view_args)
swh-storage_1                       |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 153, in newf
swh-storage_1                       |     return f.negotiator(*args, **kwargs)
swh-storage_1                       |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/negotiation.py", line 81, in __call__
swh-storage_1                       |     result = self.func(*args, **kwargs)
swh-storage_1                       |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 460, in _f
swh-storage_1                       |     return obj_meth(**kw)
swh-storage_1                       |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/metrics.py", line 24, in d
swh-storage_1                       |     return f(*a, **kw)
swh-storage_1                       |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/metrics.py", line 77, in d
swh-storage_1                       |     r = f(*a, **kw)
swh-storage_1                       |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/postgresql/storage.py", line 241, in content_add
swh-storage_1                       |     objstorage_summary = self.objstorage.content_add(contents)
swh-storage_1                       |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/objstorage.py", line 62, in content_add
swh-storage_1                       |     summary = self.objstorage.add_batch({cont.sha1: cont.data for cont in contents})
swh-storage_1                       |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/objstorage/api/client.py", line 47, in add_batch
swh-storage_1                       |     {"contents": contents, "check_presence": check_presence,},
swh-storage_1                       |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 272, in post
swh-storage_1                       |     **opts,
swh-storage_1                       |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 256, in raw_verb
swh-storage_1                       |     raise self.api_exception(e)
swh-storage_1                       | swh.objstorage.exc.ObjStorageAPIError: An unexpected error occurred in the api backend: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Then I also configured the objstorage to use the nginx proxy and the network errors did not occur anymore, I could load the large subversion repository without errors.

So it seems the following diff is the proper fix to the reported issue:

diff --git a/docker/conf/loader.yml b/docker/conf/loader.yml
index 2dd2c6e..5d9517a 100644
--- a/docker/conf/loader.yml
+++ b/docker/conf/loader.yml
@@ -8,9 +8,9 @@ storage:
         directory: 1000
         revision: 1000
     - cls: filter
-    - cls: retry
     - cls: remote
-      url: http://swh-storage:5002/
+      url: http://nginx:5080/storage
 
 celery:
   task_broker: amqp://guest:guest@amqp//
diff --git a/docker/conf/storage.yml b/docker/conf/storage.yml
index cd7d4dd..13d4ba3 100644
--- a/docker/conf/storage.yml
+++ b/docker/conf/storage.yml
@@ -3,7 +3,7 @@ storage:
   db: postgresql:///?service=swh-storage
   objstorage:
     cls: remote
-    url: http://swh-objstorage:5003/
+    url: http://nginx:5080/objstorage
   journal_writer:
     cls: kafka
     args:

After more tests using nginx, I stumbled across that error:

swh-loader_1                        | [2022-01-07 10:13:48,333: DEBUG/ForkPoolWorker-1] Flushing 1508 objects of type content (359031722 bytes)
swh-loader_1                        | [2022-01-07 10:13:48,713: ERROR/ForkPoolWorker-1] Loading failure, updating to `failed` status
swh-loader_1                        | Traceback (most recent call last):
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 328, in raise_for_status
swh-loader_1                        |     exception = pickle.loads(data)
swh-loader_1                        | TypeError: a bytes-like object is required, not 'str'
swh-loader_1                        | 
swh-loader_1                        | During handling of the above exception, another exception occurred:
swh-loader_1                        | 
swh-loader_1                        | Traceback (most recent call last):
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 339, in load
swh-loader_1                        |     self.store_data()
swh-loader_1                        |   File "/src/swh-loader-svn/swh/loader/svn/loader.py", line 486, in store_data
swh-loader_1                        |     self.storage.content_add(self._contents)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/buffer.py", line 159, in content_add
swh-loader_1                        |     return self.flush(["content"])
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/buffer.py", line 286, in flush
swh-loader_1                        |     stats = add_fn(list(batch))
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 58, in content_add
swh-loader_1                        |     [x for x in content if x.sha256 in contents_to_add]
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/api/client.py", line 45, in content_add
swh-loader_1                        |     return self.post("content/add", {"content": content})
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 278, in post
swh-loader_1                        |     return self._decode_response(response)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 354, in _decode_response
swh-loader_1                        |     self.raise_for_status(response)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/api/client.py", line 29, in raise_for_status
swh-loader_1                        |     super().raise_for_status(response)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 341, in raise_for_status
swh-loader_1                        |     raise RemoteException(payload=data, response=response)
swh-loader_1                        | swh.core.api.RemoteException: <html>
swh-loader_1                        | <head><title>413 Request Entity Too Large</title></head>
swh-loader_1                        | <body>
swh-loader_1                        | <center><h1>413 Request Entity Too Large</h1></center>
swh-loader_1                        | <hr><center>nginx/1.21.3</center>
swh-loader_1                        | </body>
swh-loader_1                        | </html>
swh-loader_1                        | 
swh-loader_1                        | [2022-01-07 10:13:48,743: DEBUG/ForkPoolWorker-1] Flushing 1508 objects of type content (359031722 bytes)
swh-loader_1                        | [2022-01-07 10:13:49,327: ERROR/ForkPoolWorker-1] Task swh.loader.svn.tasks.DumpMountAndLoadSvnRepository[aa981fcc-863b-4e79-8280-9681b0a6f7fa] raised unexpected: RemoteException('<html>\r\n<head><title>413 Request Entity Too Large</title></head>\r\n<body>\r\n<center><h1>413 Request Entity Too Large</h1></center>\r\n<hr><center>nginx/1.21.3</center>\r\n</body>\r\n</html>\r\n')
swh-loader_1                        | Traceback (most recent call last):
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 328, in raise_for_status
swh-loader_1                        |     exception = pickle.loads(data)
swh-loader_1                        | TypeError: a bytes-like object is required, not 'str'
swh-loader_1                        | 
swh-loader_1                        | During handling of the above exception, another exception occurred:
swh-loader_1                        | 
swh-loader_1                        | Traceback (most recent call last):
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 450, in trace_task
swh-loader_1                        |     R = retval = fun(*args, **kwargs)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/scheduler/task.py", line 55, in __call__
swh-loader_1                        |     result = super().__call__(*args, **kwargs)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/celery/app/trace.py", line 731, in __protected_call__
swh-loader_1                        |     return self.run(*args, **kwargs)
swh-loader_1                        |   File "/src/swh-loader-svn/swh/loader/svn/tasks.py", line 113, in load_svn_from_remote_dump
swh-loader_1                        |     return loader.load()
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 382, in load
swh-loader_1                        |     self.flush()
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 168, in flush
swh-loader_1                        |     self.storage.flush()
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/buffer.py", line 286, in flush
swh-loader_1                        |     stats = add_fn(list(batch))
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 58, in content_add
swh-loader_1                        |     [x for x in content if x.sha256 in contents_to_add]
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/api/client.py", line 45, in content_add
swh-loader_1                        |     return self.post("content/add", {"content": content})
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 278, in post
swh-loader_1                        |     return self._decode_response(response)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 354, in _decode_response
swh-loader_1                        |     self.raise_for_status(response)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/api/client.py", line 29, in raise_for_status
swh-loader_1                        |     super().raise_for_status(response)
swh-loader_1                        |   File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 341, in raise_for_status
swh-loader_1                        |     raise RemoteException(payload=data, response=response)
swh-loader_1                        | swh.core.api.RemoteException: <html>
swh-loader_1                        | <head><title>413 Request Entity Too Large</title></head>
swh-loader_1                        | <body>
swh-loader_1                        | <center><h1>413 Request Entity Too Large</h1></center>
swh-loader_1                        | <hr><center>nginx/1.21.3</center>
swh-loader_1                        | </body>
swh-loader_1                        | </html>

Bumping the client_max_body_size nginx configuration entry to the same value as in production (4G) fixed the issue.

So the correct diff to align configuration with production is:

diff --git a/docker/conf/loader.yml b/docker/conf/loader.yml
index 2dd2c6e..5d9517a 100644
--- a/docker/conf/loader.yml
+++ b/docker/conf/loader.yml
@@ -8,9 +8,9 @@ storage:
         directory: 1000
         revision: 1000
     - cls: filter
     - cls: remote
-      url: http://swh-storage:5002/
+      url: http://nginx:5080/storage
 
 celery:
   task_broker: amqp://guest:guest@amqp//
diff --git a/docker/conf/nginx.conf b/docker/conf/nginx.conf
index b3bd430..1ee5a68 100644
--- a/docker/conf/nginx.conf
+++ b/docker/conf/nginx.conf
@@ -13,7 +13,7 @@ http {
   default_type       application/octet-stream;
   sendfile           on;
   keepalive_timeout  65;
-  client_max_body_size 100M;
+  client_max_body_size 4G;
 
   # Built-in Docker resolver. Needed to allow on-demand resolution of proxy
   # upstreams.
diff --git a/docker/conf/storage.yml b/docker/conf/storage.yml
index cd7d4dd..13d4ba3 100644
--- a/docker/conf/storage.yml
+++ b/docker/conf/storage.yml
@@ -3,7 +3,7 @@ storage:
   db: postgresql:///?service=swh-storage
   objstorage:
     cls: remote
-    url: http://swh-objstorage:5003/
+    url: http://nginx:5080/objstorage
   journal_writer:
     cls: kafka
     args:

Abandon this in favor of D6891.