Page MenuHomeSoftware Heritage

git loader packfile size limit is poorly applied to HTTP(s) repositories
Closed, MigratedEdits Locked

Description

Quoting T3544#76729:

Just to recap, the original (and pretty much only) problem that this hardcoding of the TCP transport was working around, is that the dulwich http(s) client is fetching the full packfile in memory before streaming it to the user-provided do_pack function. This function is what enforces our pack file size limit.

We're not actually using any negotiation of objects to fetch in the current implementation of the git loader. We only send a static list of known object ids.

Practically, this means that the dulwich http client is streaming the packfile response entirely (via urllib3 (?), in RAM (?)) before passing it in chunks to do_pack which is able to enforce the limit. So we fetch the full response, store it in ram, before dropping it on the floor (rather than rejecting it mid-flight).

This is a dulwich limitation that we should investigate how to patch, to avoid wasting bandwidth and memory.