Page MenuHomeSoftware Heritage

buffer: add a threshold for the number of revision parents in one batch
ClosedPublic

Authored by olasd on Oct 8 2021, 3:57 PM.

Details

Summary

The size of individual revisions is essentially unbounded. This means
that, when the buffer storage is used as a way of limiting memory use
for an ingestion process, it is still possible to go beyond the expected
memory use when adding a batch of revisions with extensive histories.

The duration of the database operation for revision_add is also
commensurate to the number of revision parents added in a batch, so
using the buffer proxy to limit the time individual database operations
takes was not effective.

Adding a threshold on cumulated number of revision parents per batch
makes this overuse of memory and of database transaction time much less
likely.

Test Plan

new test added

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6445 (id=23409)

Rebasing onto 5edc0ba7ac...

Current branch diff-target is up to date.
Changes applied before test
commit 7c5b0ec15e40ce7cb91b8a50beefe29d6dc8faf7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Oct 8 15:13:59 2021 +0200

    buffer: add a threshold for the number of revision parents in one batch
    
    The size of individual revisions is essentially unbounded. This means
    that, when the buffer storage is used as a way of limiting memory use
    for an ingestion process, it is still possible to go beyond the expected
    memory use when adding a batch of revisions with extensive histories.
    
    The duration of the database operation for revision_add is also
    commensurate to the number of revision parents added in a batch, so
    using the buffer proxy to limit the time individual database operations
    takes was not effective.
    
    Adding a threshold on cumulated number of revision parents per batch
    makes this overuse of memory and of database transaction time much less
    likely.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1446/ for more details.

olasd requested review of this revision.Oct 8 2021, 4:08 PM
This revision is now accepted and ready to land.Oct 8 2021, 4:10 PM