Page MenuHomeSoftware Heritage

StreamingGraphView: Buffer lines before writing
ClosedPublic

Authored by vlorentz on Aug 10 2021, 12:04 PM.

Details

Summary

Most of the time is spent maxing out the CPU in the Python process.
This change has two effects:

  1. lines are joined before being encoded (instead of encoding them one-by-one)
  2. larger network packets are sent, instead of a single packet per line

I don't know which affects the performance, but overall, this is
a consistent 25 to 35% speed-up to the overall run time of
SimpleTraversalView.

Diff Detail

Repository
rDGRPH Compressed graph representation
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6072 (id=21986)

Rebasing onto a48b5be584...

Current branch diff-target is up to date.
Changes applied before test
commit b54ed982e2039e0bca87cbe17dd63aa667db6d40
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Aug 10 12:03:36 2021 +0200

    StreamingGraphView: Buffer lines before writing
    
    Most of the time is spent maxing out the CPU in the Python process.
    This change has two effects:
    
    1. lines are joined before being encoded (instead of encoding them one-by-one)
    2. larger network packets are sent, instead of a single packet per line
    
    I don't know which affects the performance, but overall, this is
    a consistent 25 to 35% speed-up to the overall run time of
    SimpleTraversalView.

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/131/ for more details.

This revision is now accepted and ready to land.Aug 12 2021, 1:44 AM