Page MenuHomeSoftware Heritage

backfill: only flush the journal writer on every batch
ClosedPublic

Authored by olasd on Nov 13 2020, 11:21 AM.

Details

Summary

This module's use of write_addition predated the introduction of reliable
writing in swh.journal; Since this introduction, the backfiller has been
flushing the kafka writer after writing each single object, leading to a 3x
measured slowdown on backfilling contents.

Test Plan

toxed and tested in production on getty

Diff Detail

Repository
rDSTO Storage manager
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 17077
Build 26358: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 26357: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D4471 (id=15867)

Rebasing onto 248a04b5b8...

Current branch diff-target is up to date.
Changes applied before test
commit 20d3f8e7a6a0f7de102b94d143d04ae42ee2be53
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Nov 12 18:04:44 2020 +0100

    backfill: only flush the journal writer on every batch
    
    This module's use of write_addition predated the introduction of reliable
    writing in swh.journal; Since this introduction, the backfiller has been
    flushing the kafka writer after writing each single object, leading to a 3x
    measured slowdown on backfilling contents.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1066/ for more details.

This revision is now accepted and ready to land.Nov 13 2020, 11:51 AM