Page MenuHomeSoftware Heritage

Spool large packfiles to disk instead of consuming tons of memory
ClosedPublic

Authored by olasd on Apr 30 2021, 8:25 PM.

Details

Summary

This lowers memory consumption by writing packfiles above a given
threshold to disk. This reduces the memory pressure on workers (but increases
the disk churn), and also allows to use the git loader on more, memory constrained, systems.

As there is a single temporary file which we hold open, we can use the default
Python tempfile feature which unlinks the temporary file directly, allowing the
file to be reaped as soon as the process disappears, even if the process gets
killed. This avoids the need for any manual tempfile cleanup.

Related to T3025

Test Plan

This has been exercised on large repositories (e.g. linux.git yields
a packfile that is almost 4GiB).

Diff Detail

Repository
rDLDG Git loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5657 (id=20210)

Rebasing onto 15e12fae18...

First, rewinding head to replay your work on top of it...
Applying: Spool large packfiles to disk instead of consuming tons of memory
Changes applied before test
commit 39692e66ded1bed94c530074455999366e7d2613
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Apr 26 20:57:36 2021 +0200

    Spool large packfiles to disk instead of consuming tons of memory

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/99/ for more details.

olasd requested review of this revision.Apr 30 2021, 8:28 PM
zack added a subscriber: zack.

nice hack/trade-off !

This revision is now accepted and ready to land.Apr 30 2021, 8:39 PM

Build is green

Patch application report for D5657 (id=20532)

Rebasing onto 15e12fae18...

Current branch diff-target is up to date.
Changes applied before test
commit 9823cd11e3b3b0cc965f38d8e7778250d88e1f22
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Apr 26 20:57:36 2021 +0200

    Spool large packfiles to disk instead of consuming tons of memory

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/100/ for more details.

douardda added a subscriber: douardda.

LGTM, any reason for not landing it?

LGTM, any reason for not landing it?

Nope, I had just forgotten about it!