HomeSoftware Heritage

Hardcode the use of the tcp transport for GitHub origins

This commit no longer exists in the repository. It may have been part of a branch which was deleted.

Description

Hardcode the use of the tcp transport for GitHub origins

This change is necessary because of a shortcoming in the Dulwich HTTP
transport: even if the Dulwich API lets us process the packfile in
chunks as it's received, the HTTP transport implementation needs to
entirely allocate the packfile in memory *twice*, once in the HTTP
library, and once in a BytesIO managed by Dulwich, before passing it on
to us as a chunked reader. Overall this triples the memory usage before
we can even try to interrupt the loader before it overruns its memory limit.

In contrast, the Dulwich TCP transport just gives us the read handle on
the underlying socket, doing no processing or copying of the bytes. We
can interrupt it as soon as we've received too many bytes.

Details

Provenance
olasdAuthored on Feb 25 2021, 3:59 PM
olasdPushed on Feb 25 2021, 6:46 PM
Differential Revision
D5148: Hardcode the use of the tcp transport for GitHub origins
Build Status
Buildable 19501
Build 30252: test-and-buildJenkins console · Jenkins

Commit No Longer Exists

This commit no longer exists in the repository.