Details
- Reviewers
douardda - Group Reviewers
Reviewers - Maniphest Tasks
- T2651: Make the indexer-storage publish its rows to Kafka
- Commits
- rDCIDX340c73a19604: Make the indexer storage write to the journal.
Diff Detail
- Repository
- rDCIDX Metadata indexer
- Branch
- journal
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 16447 Build 25332: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 25331: arc lint + arc unit
Event Timeline
Build has FAILED
Patch application report for D4083 (id=14402)
Rebasing onto e92b931e47...
Current branch diff-target is up to date.
Changes applied before test
commit 842798f886f54f64a7dc6a0bb092a6fcc0f04b63 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 29 16:21:13 2020 +0200 [WIP] start writing to the journal from the idx-storage
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/51/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/51/console
Build has FAILED
Patch application report for D4083 (id=15212)
Rebasing onto 82d935733b...
Current branch diff-target is up to date.
Changes applied before test
commit 27ca432c1a67d63d2a99c160d16ea1602ae8b8d0 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 29 16:21:13 2020 +0200 [WIP] start writing to the journal from the idx-storage
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/96/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/96/console
Build has FAILED
Patch application report for D4083 (id=15566)
Could not rebase; Attempt merge onto 300b307394...
Updating 300b307..c644012 Fast-forward swh/indexer/metadata.py | 41 +----------- swh/indexer/storage/__init__.py | 24 +------ swh/indexer/storage/db.py | 22 ------- swh/indexer/storage/in_memory.py | 54 ++++++---------- swh/indexer/storage/interface.py | 32 ---------- swh/indexer/storage/writer.py | 57 +++++++++++++++++ swh/indexer/tests/storage/conftest.py | 6 +- swh/indexer/tests/storage/test_api_client.py | 6 +- swh/indexer/tests/storage/test_in_memory.py | 2 +- swh/indexer/tests/storage/test_storage.py | 95 +--------------------------- swh/indexer/tests/test_origin_metadata.py | 26 -------- 11 files changed, 94 insertions(+), 271 deletions(-) create mode 100644 swh/indexer/storage/writer.py
Changes applied before test
commit c644012f1a653f6d3b1d4a2f53e66eca54bacc6c Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 29 16:21:13 2020 +0200 [WIP] start writing to the journal from the idx-storage commit 94c825919320bf3d3e2608b823dc887ed6122413 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Nov 2 13:47:51 2020 +0100 Remove metadata deletion endpoints and algorithms This was expected to be used in these two cases: 1. if we remove mappings or file detection from a metadata indexer 2. if an origin removes all its metadata files but: 1. if we do so, then we should bump the indexer version, so the old metadata will be preserved anyway, as different indexer versions get different indexer_configuration_ids 2. this should be a rather rare even, and even if it happens, we might want to keep the old metadata anyway rather than nothing (even if it's outdated), for search purposes. Additionally, this commit is motivated by: * that's less issues to deal with when writing to Kafka (the journal writer currently doesn't support suppression; and we would also have to add support for deletion in all consumers) * less code (~250 lines)
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/101/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/101/console
Build has FAILED
Patch application report for D4083 (id=15659)
Could not rebase; Attempt merge onto e2835bfff6...
Updating e2835bf..e59868e Fast-forward swh/indexer/storage/__init__.py | 26 ++++++++++++- swh/indexer/storage/db.py | 12 ++++++ swh/indexer/storage/in_memory.py | 39 +++++++++++++------ swh/indexer/storage/model.py | 13 +++++++ swh/indexer/storage/writer.py | 57 ++++++++++++++++++++++++++++ swh/indexer/tests/storage/conftest.py | 6 ++- swh/indexer/tests/storage/test_api_client.py | 29 +++++++++++--- swh/indexer/tests/storage/test_in_memory.py | 2 +- swh/indexer/tests/storage/test_storage.py | 41 ++++++++++++++------ 9 files changed, 193 insertions(+), 32 deletions(-) create mode 100644 swh/indexer/storage/writer.py
Changes applied before test
commit e59868eb41f72fe3568de477fe4a0711269b375d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 29 16:21:13 2020 +0200 Make the indexer storage write to the journal. commit 8272bc90a367f2d9a9eb231505ad3ccc126c714f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 5 13:36:44 2020 +0100 test_origin_intrinsic_metadata_add__deadlock: use more values, to make the test less likely to unexpectedly pass. commit 5a5af91ac5aee172ff58e68f8b2121de179635e9 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 5 13:37:43 2020 +0100 test_origin_intrinsic_metadata_add__deadlock: Fix failure on nondeterministic order postgresql kindly returns the results in the order the test expected... most of the time.
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/114/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/114/console
Build is green
Patch application report for D4083 (id=15659)
Could not rebase; Attempt merge onto e2835bfff6...
Updating e2835bf..e59868e Fast-forward swh/indexer/storage/__init__.py | 26 ++++++++++++- swh/indexer/storage/db.py | 12 ++++++ swh/indexer/storage/in_memory.py | 39 +++++++++++++------ swh/indexer/storage/model.py | 13 +++++++ swh/indexer/storage/writer.py | 57 ++++++++++++++++++++++++++++ swh/indexer/tests/storage/conftest.py | 6 ++- swh/indexer/tests/storage/test_api_client.py | 29 +++++++++++--- swh/indexer/tests/storage/test_in_memory.py | 2 +- swh/indexer/tests/storage/test_storage.py | 41 ++++++++++++++------ 9 files changed, 193 insertions(+), 32 deletions(-) create mode 100644 swh/indexer/storage/writer.py
Changes applied before test
commit e59868eb41f72fe3568de477fe4a0711269b375d Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 29 16:21:13 2020 +0200 Make the indexer storage write to the journal. commit 8272bc90a367f2d9a9eb231505ad3ccc126c714f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 5 13:36:44 2020 +0100 test_origin_intrinsic_metadata_add__deadlock: use more values, to make the test less likely to unexpectedly pass. commit 5a5af91ac5aee172ff58e68f8b2121de179635e9 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 5 13:37:43 2020 +0100 test_origin_intrinsic_metadata_add__deadlock: Fix failure on nondeterministic order postgresql kindly returns the results in the order the test expected... most of the time.
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/115/ for more details.
Build is green
Patch application report for D4083 (id=15661)
Could not rebase; Attempt merge onto e2835bfff6...
Updating e2835bf..1fd7ae9 Fast-forward swh/indexer/storage/__init__.py | 26 ++++++++++++- swh/indexer/storage/db.py | 12 ++++++ swh/indexer/storage/in_memory.py | 39 +++++++++++++------ swh/indexer/storage/model.py | 13 +++++++ swh/indexer/storage/writer.py | 56 ++++++++++++++++++++++++++++ swh/indexer/tests/storage/conftest.py | 6 ++- swh/indexer/tests/storage/test_api_client.py | 29 +++++++++++--- swh/indexer/tests/storage/test_in_memory.py | 2 +- swh/indexer/tests/storage/test_storage.py | 41 ++++++++++++++------ 9 files changed, 192 insertions(+), 32 deletions(-) create mode 100644 swh/indexer/storage/writer.py
Changes applied before test
commit 1fd7ae9261e8c6dce5a9cfbfa6c2bc758b833b22 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 29 16:21:13 2020 +0200 Make the indexer storage write to the journal. commit 8272bc90a367f2d9a9eb231505ad3ccc126c714f Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 5 13:36:44 2020 +0100 test_origin_intrinsic_metadata_add__deadlock: use more values, to make the test less likely to unexpectedly pass. commit 5a5af91ac5aee172ff58e68f8b2121de179635e9 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Nov 5 13:37:43 2020 +0100 test_origin_intrinsic_metadata_add__deadlock: Fix failure on nondeterministic order postgresql kindly returns the results in the order the test expected... most of the time.
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/116/ for more details.
overall I'm ok, but I find it really lacks some documentation/explanations of how this works, especially the JournalWriter collaborator object
swh/indexer/storage/__init__.py | ||
---|---|---|
128 | I know this docstring was outdated before this diff, but maybe it could be updated as part of it. |
Build is green
Patch application report for D4083 (id=15797)
Rebasing onto 8272bc90a3...
Current branch diff-target is up to date.
Changes applied before test
commit 340c73a19604273467804c7c54b217b780a1677a Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Sep 29 16:21:13 2020 +0200 Make the indexer storage write to the journal.
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/117/ for more details.
swh/indexer/storage/writer.py | ||
---|---|---|
64 | This will flush the journal writer on every message. Please avoid doing that. |