Page MenuHomeSoftware Heritage

writer: make calling flush() in write_addition(s)() optional
ClosedPublic

Authored by douardda on Oct 18 2022, 4:20 PM.

Details

Summary

using a constructor 'auto_flush' bookean argument.
The idea is that in a test session, each call to 'flush()' takes ~1s to
run, so having the test handling a single call to flush() when needed
(instead of n calls) make test execution significantly faster.

For example, swh-storage's test_replay.py execution went from ~140s to
~40s, and test_backfill.py from ~40s to ~15s.

Depends on D8697

Diff Detail

Repository
rDJNL Journal infrastructure
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D8705 (id=31427)

Could not rebase; Attempt merge onto 1f1b03cfbd...

Updating 1f1b03c..a0e2a57
Fast-forward
 swh/journal/pytest_plugin.py | 2 +-
 swh/journal/writer/kafka.py  | 8 ++++++--
 2 files changed, 7 insertions(+), 3 deletions(-)
Changes applied before test
commit a0e2a57813221d9a13d44a00a73c7df53d32180c
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Oct 18 15:34:31 2022 +0200

    writer: make calling flush() in write_addition(s)() optional
    
    using a constructor 'auto_flush' bookean argument.
    The idea is that in a test session, each call to 'flush()' takes ~1s to
    run, so having the test handling a single call to flush() when needed
    (instead of n calls) make test execution significantly faster.
    
    For example, swh-storage's test_replay.py execution went from ~140s to
    ~40s, and test_backfill.py from ~40s to ~15s.

commit eaf6c57a8407851d27f4acd94bd8bd01bc003841
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Oct 18 12:03:58 2022 +0200

    tests: make the kafka_server_base fixture a function scoped one
    
    this actually speeds up tests quite a bit (preventing a 30s timeout when
    the fixture is actually reused several times in a test session).

Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/223/
See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/223/console

Harbormaster returned this revision to the author for changes because remote builds failed.Oct 18 2022, 4:20 PM
Harbormaster failed remote builds in B32372: Diff 31427!

Build is green

Patch application report for D8705 (id=31499)

Could not rebase; Attempt merge onto 2c6d9e05ec...

Updating 2c6d9e0..887a93e
Fast-forward
 swh/journal/pytest_plugin.py | 2 +-
 swh/journal/writer/kafka.py  | 8 ++++++--
 2 files changed, 7 insertions(+), 3 deletions(-)
Changes applied before test
commit 887a93e99c5f85cf1169873ed3e98819fd644f00
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Oct 18 15:34:31 2022 +0200

    writer: make calling flush() in write_addition(s)() optional
    
    using a constructor 'auto_flush' bookean argument.
    The idea is that in a test session, each call to 'flush()' takes ~1s to
    run, so having the test handling a single call to flush() when needed
    (instead of n calls) make test execution significantly faster.
    
    For example, swh-storage's test_replay.py execution went from ~140s to
    ~40s, and test_backfill.py from ~40s to ~15s.

commit 04a84dba6a175a509d6f8d4ccde1441b8e352858
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Oct 18 12:03:58 2022 +0200

    tests: make the kafka_server_base fixture a function scoped one
    
    this actually speeds up tests quite a bit (preventing a 30s timeout when
    the fixture is actually reused several times in a test session).

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/227/ for more details.

If it's for test speedup only, don't you want to keep the actual behavior instead?
That is, make the auto_flush be False by default, and then you explicitely set it up to True in the calling tests.

olasd added a subscriber: olasd.

If it's for test speedup only, don't you want to keep the actual behavior instead?
That is, make the auto_flush be False by default, and then you explicitely set it up to True in the calling tests.

I believe that auto_flush=True is the current default, so the diff does what you're asking :-)

I'd be keen on having a warning, at least when we __del__ the object and some deliveries are still pending, because even if not perfect, that'd be the sign of a bug.

This revision is now accepted and ready to land.Oct 20 2022, 11:15 AM

If it's for test speedup only, don't you want to keep the actual behavior instead?
That is, make the auto_flush be False by default, and then you explicitely set it up to True in the calling tests.

I believe that auto_flush=True is the current default, so the diff does what you're asking :-)

I'd be keen on having a warning, at least when we __del__ the object and some deliveries are still pending, because even if not perfect, that'd be the sign of a bug.

ok, that'd also make sense... but then how come there is any speed improvments here?
I mean, I don't see any changes in tests saying then auto_flush=False in that case...

...ah... maybe, it's for other tests in another repository, right?

@vlorentz mentions that this is missing a docstring change

...ah... maybe, it's for other tests in another repository, right?

Yeah. swh-storage and swh-journal will be affected, as they call write_addition(s) on multiple object types one after the other, and have to wait for each round. And it seems that the mock broker adds a 1 second delay for flushes, for some reason.

Add the missing docstring entry

Build is green

Patch application report for D8705 (id=31537)

Could not rebase; Attempt merge onto 2c6d9e05ec...

Updating 2c6d9e0..1f607fb
Fast-forward
 swh/journal/pytest_plugin.py |  2 +-
 swh/journal/writer/kafka.py  | 11 +++++++++--
 2 files changed, 10 insertions(+), 3 deletions(-)
Changes applied before test
commit 1f607fb558659123db92b0a5c93efcdf0d64ab00
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Oct 18 15:34:31 2022 +0200

    writer: make calling flush() in write_addition(s)() optional
    
    using a constructor 'auto_flush' bookean argument.
    The idea is that in a test session, each call to 'flush()' takes ~1s to
    run, so having the test handling a single call to flush() when needed
    (instead of n calls) make test execution significantly faster.
    
    For example, swh-storage's test_replay.py execution went from ~140s to
    ~40s, and test_backfill.py from ~40s to ~15s.

commit 04a84dba6a175a509d6f8d4ccde1441b8e352858
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Oct 18 12:03:58 2022 +0200

    tests: make the kafka_server_base fixture a function scoped one
    
    this actually speeds up tests quite a bit (preventing a 30s timeout when
    the fixture is actually reused several times in a test session).

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/228/ for more details.

Build is green

Patch application report for D8705 (id=31539)

Could not rebase; Attempt merge onto 2c6d9e05ec...

Updating 2c6d9e0..7c83ee1
Fast-forward
 swh/journal/pytest_plugin.py |  2 +-
 swh/journal/writer/kafka.py  | 11 +++++++++--
 2 files changed, 10 insertions(+), 3 deletions(-)
Changes applied before test
commit 7c83ee11ecab206e082e06aaa0b8330ef6677d2c
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Oct 18 15:34:31 2022 +0200

    writer: make calling flush() in write_addition(s)() optional
    
    using a constructor 'auto_flush' bookean argument.
    The idea is that in a test session, each call to 'flush()' takes ~1s to
    run, so having the test handling a single call to flush() when needed
    (instead of n calls) make test execution significantly faster.
    
    For example, swh-storage's test_replay.py execution went from ~140s to
    ~40s, and test_backfill.py from ~40s to ~15s.

commit 04a84dba6a175a509d6f8d4ccde1441b8e352858
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Oct 18 12:03:58 2022 +0200

    tests: make the kafka_server_base fixture a function scoped one
    
    this actually speeds up tests quite a bit (preventing a 30s timeout when
    the fixture is actually reused several times in a test session).

See https://jenkins.softwareheritage.org/job/DJNL/job/tests-on-diff/229/ for more details.