Page MenuHomeSoftware Heritage

buffer: add some debug logging for number of objects sent
ClosedPublic

Authored by olasd on Oct 8 2021, 4:15 PM.

Details

Summary

helpful to understand if the thresholds we've set have been hit

Test Plan

ran a few git loads with this debug enabled

Diff Detail

Event Timeline

Build is green

Patch application report for D6447 (id=23415)

Could not rebase; Attempt merge onto 5edc0ba7ac...

Updating 5edc0ba7..901c9a9a
Fast-forward
 swh/storage/proxies/buffer.py    | 124 ++++++++++++++++++++++++++++++++++++++-
 swh/storage/tests/test_buffer.py |  75 ++++++++++++++++++++++-
 2 files changed, 196 insertions(+), 3 deletions(-)
Changes applied before test
commit 901c9a9a10e7ef4dd6ad3f4941a6373cc2e8635f
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Oct 8 15:55:29 2021 +0200

    buffer: add some debug logging for number of objects sent

commit 1db72a0e005c8201dddfca1806044659aa8f87c7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Oct 8 15:44:42 2021 +0200

    buffer: add a threshold for the estimated size of revision and release batches
    
    The size of individual revisions and releases is essentially unbounded.
    This means that, when the buffer storage is used as a way of limiting
    memory use for an ingestion process, it is still possible to go beyond
    the expected memory use when adding a batch of revisions or releases
    with large messages or other metadata.
    
    The duration of the database operations for revision_add or release_add is also
    commensurate to the size of the objects added in a batch, so
    using the buffer proxy to limit the time individual database operations
    takes was not effective.
    
    Adding a threshold on estimated sizes for batches of revision and
    release objects makes this overuse of memory and of database transaction
    time much less likely.

commit 7c5b0ec15e40ce7cb91b8a50beefe29d6dc8faf7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Oct 8 15:13:59 2021 +0200

    buffer: add a threshold for the number of revision parents in one batch
    
    The size of individual revisions is essentially unbounded. This means
    that, when the buffer storage is used as a way of limiting memory use
    for an ingestion process, it is still possible to go beyond the expected
    memory use when adding a batch of revisions with extensive histories.
    
    The duration of the database operation for revision_add is also
    commensurate to the number of revision parents added in a batch, so
    using the buffer proxy to limit the time individual database operations
    takes was not effective.
    
    Adding a threshold on cumulated number of revision parents per batch
    makes this overuse of memory and of database transaction time much less
    likely.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1448/ for more details.

olasd requested review of this revision.Oct 8 2021, 4:23 PM

And i assume they were helpful ;)

This revision is now accepted and ready to land.Oct 8 2021, 4:49 PM

Build has FAILED

Patch application report for D6447 (id=23421)

Could not rebase; Attempt merge onto 7c5b0ec15e...

Updating 7c5b0ec1..3441f689
Fast-forward
 swh/storage/proxies/buffer.py    | 112 +++++++++++++++++++++++++++++++++++++--
 swh/storage/tests/test_buffer.py |  52 +++++++++++++++++-
 2 files changed, 159 insertions(+), 5 deletions(-)
Changes applied before test
commit 3441f68985ae13c134c8f9f9bcccf3a541508d05
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Oct 8 15:55:29 2021 +0200

    buffer: add some debug logging for number of objects sent

commit b6040142fe723771f43ffef75b2e1fc778641a42
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Oct 8 15:44:42 2021 +0200

    buffer: add a threshold for the estimated size of revision and release batches
    
    The size of individual revisions and releases is essentially unbounded.
    This means that, when the buffer storage is used as a way of limiting
    memory use for an ingestion process, it is still possible to go beyond
    the expected memory use when adding a batch of revisions or releases
    with large messages or other metadata.
    
    The duration of the database operations for revision_add or release_add is also
    commensurate to the size of the objects added in a batch, so
    using the buffer proxy to limit the time individual database operations
    takes was not effective.
    
    Adding a threshold on estimated sizes for batches of revision and
    release objects makes this overuse of memory and of database transaction
    time much less likely.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1450/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1450/console

Build is green

Patch application report for D6447 (id=23421)

Could not rebase; Attempt merge onto 7c5b0ec15e...

Updating 7c5b0ec1..3441f689
Fast-forward
 swh/storage/proxies/buffer.py    | 112 +++++++++++++++++++++++++++++++++++++--
 swh/storage/tests/test_buffer.py |  52 +++++++++++++++++-
 2 files changed, 159 insertions(+), 5 deletions(-)
Changes applied before test
commit 3441f68985ae13c134c8f9f9bcccf3a541508d05
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Oct 8 15:55:29 2021 +0200

    buffer: add some debug logging for number of objects sent

commit b6040142fe723771f43ffef75b2e1fc778641a42
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Fri Oct 8 15:44:42 2021 +0200

    buffer: add a threshold for the estimated size of revision and release batches
    
    The size of individual revisions and releases is essentially unbounded.
    This means that, when the buffer storage is used as a way of limiting
    memory use for an ingestion process, it is still possible to go beyond
    the expected memory use when adding a batch of revisions or releases
    with large messages or other metadata.
    
    The duration of the database operations for revision_add or release_add is also
    commensurate to the size of the objects added in a batch, so
    using the buffer proxy to limit the time individual database operations
    takes was not effective.
    
    Adding a threshold on estimated sizes for batches of revision and
    release objects makes this overuse of memory and of database transaction
    time much less likely.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1451/ for more details.