Details
- Reviewers
ardumont - Group Reviewers
Reviewers - Maniphest Tasks
- T2355: Make swh-journal independent from swh-storage or swh-objstorage
- Commits
- rDSTO4b7ba1fc6bcc: Copy the graph replayer component from swh-journal
Diff Detail
- Repository
- rDSTO Storage manager
- Branch
- t2355
- Lint
Lint Skipped - Unit
Unit Tests Skipped - Build Status
Buildable 12029 Build 18244: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 18243: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D3010 (id=10679)
Could not rebase; Attempt merge onto ddac3d27e3...
Updating ddac3d2..24cb5fd Fast-forward mypy.ini | 9 + swh/storage/backfill.py | 490 +++++++++++++++++++++++++++++ swh/storage/cli.py | 99 ++++++ swh/storage/fixer.py | 293 +++++++++++++++++ swh/storage/replay.py | 128 ++++++++ swh/storage/tests/conftest.py | 47 ++- swh/storage/tests/test_api_client.py | 7 +- swh/storage/tests/test_api_client_dicts.py | 28 +- swh/storage/tests/test_backfill.py | 160 ++++++++++ swh/storage/tests/test_replay.py | 155 +++++++++ swh/storage/tests/test_storage.py | 239 ++++++++------ swh/storage/tests/test_write_replay.py | 166 ++++++++++ 12 files changed, 1697 insertions(+), 124 deletions(-) create mode 100644 swh/storage/backfill.py create mode 100644 swh/storage/fixer.py create mode 100644 swh/storage/replay.py create mode 100644 swh/storage/tests/test_backfill.py create mode 100644 swh/storage/tests/test_replay.py create mode 100644 swh/storage/tests/test_write_replay.py
Changes applied before test
commit 24cb5fd27ac55d802e31c792b37d827c5e3aa4b5 Author: David Douard <david.douard@sdfa3.org> Date: Thu Apr 9 17:45:55 2020 +0200 Copy the graph replayer component from swh-journal In test_replay, get rid of `test_storage_play` and `test_storage_play_with_collision` because these tests do require a proper kafka setup to run, which we do not want to require kafka in the swh.storage package. These 2 tests should be, if needed, rewriten somehow. commit 687fa2b3a4ba40ae58746b1504721c766dba5de9 Author: David Douard <david.douard@sdfa3.org> Date: Fri Apr 10 11:03:21 2020 +0200 cli: rename the command 'backfiller' as 'backfill' for the sake of consistency. commit 93ca058fc487aa13f6c3e4895be1ea82aea2408a Author: David Douard <david.douard@sdfa3.org> Date: Thu Apr 9 16:56:44 2020 +0200 Copy the backfiller component from swh-journal This componant makes more sense in the swh-storage package. commit fd5ed546581506906dad640ba8c686a8ae754bbb Author: David Douard <david.douard@sdfa3.org> Date: Tue Apr 7 16:17:39 2020 +0200 test: update storage tests to (future) swh.journal 0.0.30 which will handle swh.model objects everywhere instead of dicts. Also add a BW compat version os the InMemoryJournalWriter so tests will pass with current version of swh.journal (0.0.29).
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/88/ for more details.
In test_replay, get rid of test_storage_play and
test_storage_play_with_collision because these tests do require a proper
kafka setup to run, which we do not want to require kafka in the
swh.storage package.
hu ho...
i'm not fine with getting rid of test_storage_play*.
See for example D3015 which moves the hash collision policy within the retry proxy.
To ensure this continued to be ok, i've kept the test_storage_play_with_collision as is, only changed the test config with retry proxy and the test is green... (and it is, well locally because right now it's in diff).
My point is those are important tests.
I understand, however, these 2 tests, as currently written, are too far "integration tests" rather than "unit tests". And it makes the testing stack too complex.
So IMHO we want "unit" tests dedicated to parts of this stack that handle collisions 9typically without kafka being involved, swh-storage side) where there are useful, and integration tests (in the docker-dev project typically) to check the whole stack behaves properly.
Unless we can use the (new) testing stack that comes with confluent-kafka, as reported by @olasd, if I'm not mistaken.
Ah, I failed to submit the following comment on Friday:
I'm a bit uneasy about dropping the replayer tests outright.
Apparently, librdkafka (the lib that confluent-kafka-python uses) has a full implementation of a mocked kafka broker. The only feature missing for our tests is the admin interface, but that's only used for a single test and I believe that we can work around that. I'll submit a diff to that end, and the code-side integration tests using kafka should be able to stay.
Build is green
Patch application report for D3010 (id=10874)
Could not rebase; Attempt merge onto fe56005555...
Updating fe56005..bd1ac8c Fast-forward mypy.ini | 9 + swh/storage/backfill.py | 490 +++++++++++++++++++++++++++++++++ swh/storage/cli.py | 141 +++++++++- swh/storage/fixer.py | 293 ++++++++++++++++++++ swh/storage/replay.py | 128 +++++++++ swh/storage/tests/conftest.py | 2 - swh/storage/tests/test_backfill.py | 160 +++++++++++ swh/storage/tests/test_cli.py | 193 +++++++++++++ swh/storage/tests/test_replay.py | 388 ++++++++++++++++++++++++++ swh/storage/tests/test_write_replay.py | 112 ++++++++ 10 files changed, 1909 insertions(+), 7 deletions(-) create mode 100644 swh/storage/backfill.py create mode 100644 swh/storage/fixer.py create mode 100644 swh/storage/replay.py create mode 100644 swh/storage/tests/test_backfill.py create mode 100644 swh/storage/tests/test_cli.py create mode 100644 swh/storage/tests/test_replay.py create mode 100644 swh/storage/tests/test_write_replay.py
Changes applied before test
commit bd1ac8c758217eb734f33e1f67e24606ab307006 Author: David Douard <david.douard@sdfa3.org> Date: Thu Apr 9 17:45:55 2020 +0200 Copy the graph replayer component from swh-journal The CLI command is included as well as `swh storage replay`. Copied test test_replay, in test_cli.py, should be identical. commit e245a465589fa24288bb62bd57aaf925d665c353 Author: David Douard <david.douard@sdfa3.org> Date: Thu Apr 23 15:27:28 2020 +0200 Deprecate the `config-path` argument of the `swh storage rpc-serve` command in favor of the standard `--config-file` option of `swh storage`. Attempt to write a couple of tests for the rpc-serve command. commit cd32cf42538ff256bed718fd747e289c9a311ebf Author: David Douard <david.douard@sdfa3.org> Date: Wed Apr 22 16:57:07 2020 +0200 Normalize the configuration file handling in the `swh storage` CLI command Almost every other swh package handles the loading of the config file from the main click command group of the package. So we make storage behaves the same. commit 698af8ca0028c5847326cf57f867608e533af9ed Author: David Douard <david.douard@sdfa3.org> Date: Fri Apr 10 11:03:21 2020 +0200 cli: rename the command 'backfiller' as 'backfill' for the sake of consistency. commit b2bba450aaa808406b024800dcd50b4778313903 Author: David Douard <david.douard@sdfa3.org> Date: Thu Apr 9 16:56:44 2020 +0200 Copy the backfiller component from swh-journal This componant makes more sense in the swh-storage package.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/122/ for more details.
i'm not fine with getting rid of test_storage_play*.
...
My point is those are important tests.
And they are back, picture me happy ;)
finishing reading through
swh/storage/tests/test_cli.py | ||
---|---|---|
8 | unneeded extra lines? | |
54 | oh that's an interesting way to do it. Note for later: Some tests in test_retry should definitely do mock sleep that way, to get it out of the tests body. |
Missing a requirements bump on latest swh-journal (v0.0.31) since we are using its pytest plugin fixture now.
Build is green
Patch application report for D3010 (id=10877)
Could not rebase; Attempt merge onto fe56005555...
Updating fe56005..726dd9a Fast-forward mypy.ini | 9 + swh/storage/backfill.py | 490 +++++++++++++++++++++++++++++++++ swh/storage/cli.py | 141 +++++++++- swh/storage/fixer.py | 293 ++++++++++++++++++++ swh/storage/replay.py | 128 +++++++++ swh/storage/tests/conftest.py | 2 - swh/storage/tests/test_backfill.py | 160 +++++++++++ swh/storage/tests/test_cli.py | 194 +++++++++++++ swh/storage/tests/test_kafka_writer.py | 60 ++++ swh/storage/tests/test_replay.py | 388 ++++++++++++++++++++++++++ swh/storage/tests/test_write_replay.py | 112 ++++++++ 11 files changed, 1970 insertions(+), 7 deletions(-) create mode 100644 swh/storage/backfill.py create mode 100644 swh/storage/fixer.py create mode 100644 swh/storage/replay.py create mode 100644 swh/storage/tests/test_backfill.py create mode 100644 swh/storage/tests/test_cli.py create mode 100644 swh/storage/tests/test_kafka_writer.py create mode 100644 swh/storage/tests/test_replay.py create mode 100644 swh/storage/tests/test_write_replay.py
Changes applied before test
commit 726dd9a6f5fd25bb43b395bc0c162c4068b7c9c8 Author: David Douard <david.douard@sdfa3.org> Date: Thu Apr 9 17:45:55 2020 +0200 Copy the graph replayer component from swh-journal The CLI command is included as well as `swh storage replay`. Copied test test_replay, in test_cli.py, should be identical. commit 5fd9b563298f1068636ba57be10d2f1975f09781 Author: David Douard <david.douard@sdfa3.org> Date: Thu Apr 23 15:27:28 2020 +0200 Deprecate the `config-path` argument of the `swh storage rpc-serve` command in favor of the standard `--config-file` option of `swh storage`. Attempt to write a couple of tests for the rpc-serve command. commit cd32cf42538ff256bed718fd747e289c9a311ebf Author: David Douard <david.douard@sdfa3.org> Date: Wed Apr 22 16:57:07 2020 +0200 Normalize the configuration file handling in the `swh storage` CLI command Almost every other swh package handles the loading of the config file from the main click command group of the package. So we make storage behaves the same. commit 698af8ca0028c5847326cf57f867608e533af9ed Author: David Douard <david.douard@sdfa3.org> Date: Fri Apr 10 11:03:21 2020 +0200 cli: rename the command 'backfiller' as 'backfill' for the sake of consistency. commit b2bba450aaa808406b024800dcd50b4778313903 Author: David Douard <david.douard@sdfa3.org> Date: Thu Apr 9 16:56:44 2020 +0200 Copy the backfiller component from swh-journal This componant makes more sense in the swh-storage package.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/125/ for more details.
swh/storage/tests/test_cli.py | ||
---|---|---|
54 | well, not mock, "monkeypatch" but the pytest way ;) |
Build is green
Patch application report for D3010 (id=10883)
Rebasing onto 5fd9b56329...
Current branch diff-target is up to date.
Changes applied before test
commit 4b7ba1fc6bccd18c391a841da636a950bc66111d Author: David Douard <david.douard@sdfa3.org> Date: Thu Apr 9 17:45:55 2020 +0200 Copy the graph replayer component from swh-journal The CLI command is included as well as `swh storage replay`. Copied test test_replay, in test_cli.py, should be identical.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/127/ for more details.