Page MenuHomeSoftware Heritage

Copy the graph replayer component from swh-journal
ClosedPublic

Authored by douardda on Apr 10 2020, 3:37 PM.

Details

Summary

In test_replay, get rid of test_storage_play and
test_storage_play_with_collision because these tests do require a proper
kafka setup to run, which we do not want to require kafka in the
swh.storage package.

These 2 tests should be, if needed, rewriten somehow.

Depends on D3008.
Related to T2355.

Diff Detail

Event Timeline

Build is green

Patch application report for D3010 (id=10679)

Could not rebase; Attempt merge onto ddac3d27e3...

Updating ddac3d2..24cb5fd
Fast-forward
 mypy.ini                                   |   9 +
 swh/storage/backfill.py                    | 490 +++++++++++++++++++++++++++++
 swh/storage/cli.py                         |  99 ++++++
 swh/storage/fixer.py                       | 293 +++++++++++++++++
 swh/storage/replay.py                      | 128 ++++++++
 swh/storage/tests/conftest.py              |  47 ++-
 swh/storage/tests/test_api_client.py       |   7 +-
 swh/storage/tests/test_api_client_dicts.py |  28 +-
 swh/storage/tests/test_backfill.py         | 160 ++++++++++
 swh/storage/tests/test_replay.py           | 155 +++++++++
 swh/storage/tests/test_storage.py          | 239 ++++++++------
 swh/storage/tests/test_write_replay.py     | 166 ++++++++++
 12 files changed, 1697 insertions(+), 124 deletions(-)
 create mode 100644 swh/storage/backfill.py
 create mode 100644 swh/storage/fixer.py
 create mode 100644 swh/storage/replay.py
 create mode 100644 swh/storage/tests/test_backfill.py
 create mode 100644 swh/storage/tests/test_replay.py
 create mode 100644 swh/storage/tests/test_write_replay.py
Changes applied before test
commit 24cb5fd27ac55d802e31c792b37d827c5e3aa4b5
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Apr 9 17:45:55 2020 +0200

    Copy the graph replayer component from swh-journal
    
    In test_replay, get rid of `test_storage_play` and
    `test_storage_play_with_collision` because these tests do require a proper
    kafka setup to run, which we do not want to require kafka in the
    swh.storage package.
    
    These 2 tests should be, if needed, rewriten somehow.

commit 687fa2b3a4ba40ae58746b1504721c766dba5de9
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Apr 10 11:03:21 2020 +0200

    cli: rename the command 'backfiller' as 'backfill'
    
    for the sake of consistency.

commit 93ca058fc487aa13f6c3e4895be1ea82aea2408a
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Apr 9 16:56:44 2020 +0200

    Copy the backfiller component from swh-journal
    
    This componant makes more sense in the swh-storage package.

commit fd5ed546581506906dad640ba8c686a8ae754bbb
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Apr 7 16:17:39 2020 +0200

    test: update storage tests to (future) swh.journal 0.0.30
    
    which will handle swh.model objects everywhere instead of dicts.
    
    Also add a BW compat version os the InMemoryJournalWriter so tests
    will pass with current version of swh.journal (0.0.29).

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/88/ for more details.

In test_replay, get rid of test_storage_play and
test_storage_play_with_collision because these tests do require a proper
kafka setup to run, which we do not want to require kafka in the
swh.storage package.

hu ho...
i'm not fine with getting rid of test_storage_play*.

See for example D3015 which moves the hash collision policy within the retry proxy.
To ensure this continued to be ok, i've kept the test_storage_play_with_collision as is, only changed the test config with retry proxy and the test is green... (and it is, well locally because right now it's in diff).
My point is those are important tests.

In test_replay, get rid of test_storage_play and
test_storage_play_with_collision because these tests do require a proper
kafka setup to run, which we do not want to require kafka in the
swh.storage package.

hu ho...
i'm not fine with getting rid of test_storage_play*.

See for example D3015 which moves the hash collision policy within the retry proxy.
To ensure this continued to be ok, i've kept the test_storage_play_with_collision as is, only changed the test config with retry proxy and the test is green... (and it is, well locally because right now it's in diff).
My point is those are important tests.

I understand, however, these 2 tests, as currently written, are too far "integration tests" rather than "unit tests". And it makes the testing stack too complex.

So IMHO we want "unit" tests dedicated to parts of this stack that handle collisions 9typically without kafka being involved, swh-storage side) where there are useful, and integration tests (in the docker-dev project typically) to check the whole stack behaves properly.

In test_replay, get rid of test_storage_play and
test_storage_play_with_collision because these tests do require a proper
kafka setup to run, which we do not want to require kafka in the
swh.storage package.

hu ho...
i'm not fine with getting rid of test_storage_play*.

See for example D3015 which moves the hash collision policy within the retry proxy.
To ensure this continued to be ok, i've kept the test_storage_play_with_collision as is, only changed the test config with retry proxy and the test is green... (and it is, well locally because right now it's in diff).
My point is those are important tests.

I understand, however, these 2 tests, as currently written, are too far "integration tests" rather than "unit tests". And it makes the testing stack too complex.

So IMHO we want "unit" tests dedicated to parts of this stack that handle collisions 9typically without kafka being involved, swh-storage side) where there are useful, and integration tests (in the docker-dev project typically) to check the whole stack behaves properly.

Unless we can use the (new) testing stack that comes with confluent-kafka, as reported by @olasd, if I'm not mistaken.

Ah, I failed to submit the following comment on Friday:

I'm a bit uneasy about dropping the replayer tests outright.

Apparently, librdkafka (the lib that confluent-kafka-python uses) has a full implementation of a mocked kafka broker. The only feature missing for our tests is the admin interface, but that's only used for a single test and I believe that we can work around that. I'll submit a diff to that end, and the code-side integration tests using kafka should be able to stay.

Build is green

Patch application report for D3010 (id=10874)

Could not rebase; Attempt merge onto fe56005555...

Updating fe56005..bd1ac8c
Fast-forward
 mypy.ini                               |   9 +
 swh/storage/backfill.py                | 490 +++++++++++++++++++++++++++++++++
 swh/storage/cli.py                     | 141 +++++++++-
 swh/storage/fixer.py                   | 293 ++++++++++++++++++++
 swh/storage/replay.py                  | 128 +++++++++
 swh/storage/tests/conftest.py          |   2 -
 swh/storage/tests/test_backfill.py     | 160 +++++++++++
 swh/storage/tests/test_cli.py          | 193 +++++++++++++
 swh/storage/tests/test_replay.py       | 388 ++++++++++++++++++++++++++
 swh/storage/tests/test_write_replay.py | 112 ++++++++
 10 files changed, 1909 insertions(+), 7 deletions(-)
 create mode 100644 swh/storage/backfill.py
 create mode 100644 swh/storage/fixer.py
 create mode 100644 swh/storage/replay.py
 create mode 100644 swh/storage/tests/test_backfill.py
 create mode 100644 swh/storage/tests/test_cli.py
 create mode 100644 swh/storage/tests/test_replay.py
 create mode 100644 swh/storage/tests/test_write_replay.py
Changes applied before test
commit bd1ac8c758217eb734f33e1f67e24606ab307006
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Apr 9 17:45:55 2020 +0200

    Copy the graph replayer component from swh-journal
    
    The CLI command is included as well as `swh storage replay`.
    
    Copied test test_replay, in test_cli.py, should be identical.

commit e245a465589fa24288bb62bd57aaf925d665c353
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Apr 23 15:27:28 2020 +0200

    Deprecate the `config-path` argument of the `swh storage rpc-serve` command
    
    in favor of the standard `--config-file` option of `swh storage`.
    
    Attempt to write a couple of tests for the rpc-serve command.

commit cd32cf42538ff256bed718fd747e289c9a311ebf
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Apr 22 16:57:07 2020 +0200

    Normalize the configuration file handling in the `swh storage` CLI command
    
    Almost every other swh package handles the loading of the config file
    from the main click command group of the package.
    So we make storage behaves the same.

commit 698af8ca0028c5847326cf57f867608e533af9ed
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Apr 10 11:03:21 2020 +0200

    cli: rename the command 'backfiller' as 'backfill'
    
    for the sake of consistency.

commit b2bba450aaa808406b024800dcd50b4778313903
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Apr 9 16:56:44 2020 +0200

    Copy the backfiller component from swh-journal
    
    This componant makes more sense in the swh-storage package.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/122/ for more details.

rebase + add forgotten test_kafka_writer test file

i'm not fine with getting rid of test_storage_play*.
...
My point is those are important tests.

And they are back, picture me happy ;)

finishing reading through

swh/storage/tests/test_cli.py
8

unneeded extra lines?

52

oh that's an interesting way to do it.

Note for later: Some tests in test_retry should definitely do mock sleep that way, to get it out of the tests body.

Missing a requirements bump on latest swh-journal (v0.0.31) since we are using its pytest plugin fixture now.

This revision is now accepted and ready to land.Apr 24 2020, 11:44 AM

Build is green

Patch application report for D3010 (id=10877)

Could not rebase; Attempt merge onto fe56005555...

Updating fe56005..726dd9a
Fast-forward
 mypy.ini                               |   9 +
 swh/storage/backfill.py                | 490 +++++++++++++++++++++++++++++++++
 swh/storage/cli.py                     | 141 +++++++++-
 swh/storage/fixer.py                   | 293 ++++++++++++++++++++
 swh/storage/replay.py                  | 128 +++++++++
 swh/storage/tests/conftest.py          |   2 -
 swh/storage/tests/test_backfill.py     | 160 +++++++++++
 swh/storage/tests/test_cli.py          | 194 +++++++++++++
 swh/storage/tests/test_kafka_writer.py |  60 ++++
 swh/storage/tests/test_replay.py       | 388 ++++++++++++++++++++++++++
 swh/storage/tests/test_write_replay.py | 112 ++++++++
 11 files changed, 1970 insertions(+), 7 deletions(-)
 create mode 100644 swh/storage/backfill.py
 create mode 100644 swh/storage/fixer.py
 create mode 100644 swh/storage/replay.py
 create mode 100644 swh/storage/tests/test_backfill.py
 create mode 100644 swh/storage/tests/test_cli.py
 create mode 100644 swh/storage/tests/test_kafka_writer.py
 create mode 100644 swh/storage/tests/test_replay.py
 create mode 100644 swh/storage/tests/test_write_replay.py
Changes applied before test
commit 726dd9a6f5fd25bb43b395bc0c162c4068b7c9c8
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Apr 9 17:45:55 2020 +0200

    Copy the graph replayer component from swh-journal
    
    The CLI command is included as well as `swh storage replay`.
    
    Copied test test_replay, in test_cli.py, should be identical.

commit 5fd9b563298f1068636ba57be10d2f1975f09781
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Apr 23 15:27:28 2020 +0200

    Deprecate the `config-path` argument of the `swh storage rpc-serve` command
    
    in favor of the standard `--config-file` option of `swh storage`.
    
    Attempt to write a couple of tests for the rpc-serve command.

commit cd32cf42538ff256bed718fd747e289c9a311ebf
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Apr 22 16:57:07 2020 +0200

    Normalize the configuration file handling in the `swh storage` CLI command
    
    Almost every other swh package handles the loading of the config file
    from the main click command group of the package.
    So we make storage behaves the same.

commit 698af8ca0028c5847326cf57f867608e533af9ed
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Apr 10 11:03:21 2020 +0200

    cli: rename the command 'backfiller' as 'backfill'
    
    for the sake of consistency.

commit b2bba450aaa808406b024800dcd50b4778313903
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Apr 9 16:56:44 2020 +0200

    Copy the backfiller component from swh-journal
    
    This componant makes more sense in the swh-storage package.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/125/ for more details.

swh/storage/tests/test_cli.py
52

well, not mock, "monkeypatch" but the pytest way ;)

clean import statements in test_cli.py

swh/storage/tests/test_cli.py
52

Build is green

Patch application report for D3010 (id=10883)

Rebasing onto 5fd9b56329...

Current branch diff-target is up to date.
Changes applied before test
commit 4b7ba1fc6bccd18c391a841da636a950bc66111d
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Apr 9 17:45:55 2020 +0200

    Copy the graph replayer component from swh-journal
    
    The CLI command is included as well as `swh storage replay`.
    
    Copied test test_replay, in test_cli.py, should be identical.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/127/ for more details.