Page MenuHomeSoftware Heritage

loader, cvsclient: Read files line by line to reduce memory consumption
ClosedPublic

Authored by anlambert on Oct 14 2022, 11:37 AM.

Details

Summary

Instead of using the readlines method on file objects that retrieve all
lines of a file and store them in memory, prefer to read files line
by line by using the lazy generator of lines from file objects.

This significantly reduce loader memory consumption when processing
a large rlog output stored in a file.

Diff Detail

Repository
rDLDCVS CVS Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8683 (id=31356)

Rebasing onto 965c3de498...

Current branch diff-target is up to date.
Changes applied before test
commit cfe7507a7366c52d92793830f5ce89063a2acca4
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Fri Oct 14 11:33:35 2022 +0200

    loader, cvsclient: Read files line by line to reduce memory consumption
    
    Instead of using the readlines method on file objects that retrieve all
    lines of a file and store them in memory, prefer to read files line
    by line by using the lazy generator of lines from file objects.
    
    This significantly reduce loader memory consumption when processing
    a large rlog output stored in a file.

See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/132/ for more details.

This revision is now accepted and ready to land.Oct 14 2022, 6:59 PM