Page MenuHomeSoftware Heritage

backfill: only flush the journal writer on every batch
ClosedPublic

Authored by olasd on Fri, Nov 13, 11:21 AM.

Details

Summary

This module's use of write_addition predated the introduction of reliable
writing in swh.journal; Since this introduction, the backfiller has been
flushing the kafka writer after writing each single object, leading to a 3x
measured slowdown on backfilling contents.

Test Plan

toxed and tested in production on getty

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

olasd created this revision.Fri, Nov 13, 11:21 AM

Build is green

Patch application report for D4471 (id=15867)

Rebasing onto 248a04b5b8...

Current branch diff-target is up to date.
Changes applied before test
commit 20d3f8e7a6a0f7de102b94d143d04ae42ee2be53
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Nov 12 18:04:44 2020 +0100

    backfill: only flush the journal writer on every batch
    
    This module's use of write_addition predated the introduction of reliable
    writing in swh.journal; Since this introduction, the backfiller has been
    flushing the kafka writer after writing each single object, leading to a 3x
    measured slowdown on backfilling contents.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1066/ for more details.

This revision is now accepted and ready to land.Fri, Nov 13, 11:51 AM
This revision was automatically updated to reflect the committed changes.