Diffusion Git loader 342f8fde2560

Hardcode the use of the tcp transport for GitHub origins
342f8fde2560
Actions

Tags

None

Subscribers

None

Description

Hardcode the use of the tcp transport for GitHub origins

This change is necessary because of a shortcoming in the Dulwich HTTP
transport: even if the Dulwich API lets us process the packfile in
chunks as it's received, the HTTP transport implementation needs to
entirely allocate the packfile in memory *twice*, once in the HTTP
library, and once in a BytesIO managed by Dulwich, before passing it on
to us as a chunked reader. Overall this triples the memory usage before
we can even try to interrupt the loader before it overruns its memory limit.

In contrast, the Dulwich TCP transport just gives us the read handle on
the underlying socket, doing no processing or copying of the bytes. We
can interrupt it as soon as we've received too many bytes.

Details

Provenance

olasd	Authored on Feb 25 2021, 3:59 PM
olasd	Pushed on Feb 25 2021, 6:46 PM

Differential Revision

D5148: Hardcode the use of the tcp transport for GitHub origins

Parents

rDLDG61afbc56b035: Stop processing packfiles before sending objects

Branches

Unknown

Tags

Unknown

References

Build Status

Buildable 19501
Build 30252: test-and-build	Jenkins console · Jenkins

Event Timeline

olasd committed rDLDG342f8fde2560: Hardcode the use of the tcp transport for GitHub origins (authored by olasd).Feb 25 2021, 6:39 PM

olasd added an edge: D5148: Hardcode the use of the tcp transport for GitHub origins.Feb 25 2021, 6:46 PM

Harbormaster completed building B19501: rDLDG342f8fde2560: Hardcode the use of the tcp transport for GitHub origins.Feb 25 2021, 6:50 PM

swh-public-ci mentioned this in D5254: Rename 'git_metadata' to 'extra_headers'.Mar 16 2021, 11:40 AM

Changes (1)

Path

Size

swh/

loader/

git/

rDLDG342f8fde2560

swh/loader/git/loader.py

Loading...