Details
- Reviewers
douardda - Group Reviewers
Reviewers - Maniphest Tasks
- T2651: Make the indexer-storage publish its rows to Kafka
- Commits
- rDCIDX340c73a19604: Make the indexer storage write to the journal.
Diff Detail
- Repository
- rDCIDX Metadata indexer
- Branch
- journal
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 16881 Build 26044: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 26043: arc lint + arc unit
Event Timeline
Build has FAILED
Patch application report for D4083 (id=14402)
Rebasing onto e92b931e47...
Current branch diff-target is up to date.
Changes applied before test
commit 842798f886f54f64a7dc6a0bb092a6fcc0f04b63
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 29 16:21:13 2020 +0200
[WIP] start writing to the journal from the idx-storageLink to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/51/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/51/console
Build has FAILED
Patch application report for D4083 (id=15212)
Rebasing onto 82d935733b...
Current branch diff-target is up to date.
Changes applied before test
commit 27ca432c1a67d63d2a99c160d16ea1602ae8b8d0
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 29 16:21:13 2020 +0200
[WIP] start writing to the journal from the idx-storageLink to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/96/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/96/console
Build has FAILED
Patch application report for D4083 (id=15566)
Could not rebase; Attempt merge onto 300b307394...
Updating 300b307..c644012 Fast-forward swh/indexer/metadata.py | 41 +----------- swh/indexer/storage/__init__.py | 24 +------ swh/indexer/storage/db.py | 22 ------- swh/indexer/storage/in_memory.py | 54 ++++++---------- swh/indexer/storage/interface.py | 32 ---------- swh/indexer/storage/writer.py | 57 +++++++++++++++++ swh/indexer/tests/storage/conftest.py | 6 +- swh/indexer/tests/storage/test_api_client.py | 6 +- swh/indexer/tests/storage/test_in_memory.py | 2 +- swh/indexer/tests/storage/test_storage.py | 95 +--------------------------- swh/indexer/tests/test_origin_metadata.py | 26 -------- 11 files changed, 94 insertions(+), 271 deletions(-) create mode 100644 swh/indexer/storage/writer.py
Changes applied before test
commit c644012f1a653f6d3b1d4a2f53e66eca54bacc6c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 29 16:21:13 2020 +0200
[WIP] start writing to the journal from the idx-storage
commit 94c825919320bf3d3e2608b823dc887ed6122413
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Mon Nov 2 13:47:51 2020 +0100
Remove metadata deletion endpoints and algorithms
This was expected to be used in these two cases:
1. if we remove mappings or file detection from a metadata indexer
2. if an origin removes all its metadata files
but:
1. if we do so, then we should bump the indexer version, so the
old metadata will be preserved anyway, as different indexer
versions get different indexer_configuration_ids
2. this should be a rather rare even, and even if it happens, we
might want to keep the old metadata anyway rather than
nothing (even if it's outdated), for search purposes.
Additionally, this commit is motivated by:
* that's less issues to deal with when writing to Kafka (the journal
writer currently doesn't support suppression; and we would also have
to add support for deletion in all consumers)
* less code (~250 lines)Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/101/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/101/console
Build has FAILED
Patch application report for D4083 (id=15659)
Could not rebase; Attempt merge onto e2835bfff6...
Updating e2835bf..e59868e Fast-forward swh/indexer/storage/__init__.py | 26 ++++++++++++- swh/indexer/storage/db.py | 12 ++++++ swh/indexer/storage/in_memory.py | 39 +++++++++++++------ swh/indexer/storage/model.py | 13 +++++++ swh/indexer/storage/writer.py | 57 ++++++++++++++++++++++++++++ swh/indexer/tests/storage/conftest.py | 6 ++- swh/indexer/tests/storage/test_api_client.py | 29 +++++++++++--- swh/indexer/tests/storage/test_in_memory.py | 2 +- swh/indexer/tests/storage/test_storage.py | 41 ++++++++++++++------ 9 files changed, 193 insertions(+), 32 deletions(-) create mode 100644 swh/indexer/storage/writer.py
Changes applied before test
commit e59868eb41f72fe3568de477fe4a0711269b375d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 29 16:21:13 2020 +0200
Make the indexer storage write to the journal.
commit 8272bc90a367f2d9a9eb231505ad3ccc126c714f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 5 13:36:44 2020 +0100
test_origin_intrinsic_metadata_add__deadlock: use more values, to make the test less likely to unexpectedly pass.
commit 5a5af91ac5aee172ff58e68f8b2121de179635e9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 5 13:37:43 2020 +0100
test_origin_intrinsic_metadata_add__deadlock: Fix failure on nondeterministic order
postgresql kindly returns the results in the order the test expected... most of the time.Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/114/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/114/console
Build is green
Patch application report for D4083 (id=15659)
Could not rebase; Attempt merge onto e2835bfff6...
Updating e2835bf..e59868e Fast-forward swh/indexer/storage/__init__.py | 26 ++++++++++++- swh/indexer/storage/db.py | 12 ++++++ swh/indexer/storage/in_memory.py | 39 +++++++++++++------ swh/indexer/storage/model.py | 13 +++++++ swh/indexer/storage/writer.py | 57 ++++++++++++++++++++++++++++ swh/indexer/tests/storage/conftest.py | 6 ++- swh/indexer/tests/storage/test_api_client.py | 29 +++++++++++--- swh/indexer/tests/storage/test_in_memory.py | 2 +- swh/indexer/tests/storage/test_storage.py | 41 ++++++++++++++------ 9 files changed, 193 insertions(+), 32 deletions(-) create mode 100644 swh/indexer/storage/writer.py
Changes applied before test
commit e59868eb41f72fe3568de477fe4a0711269b375d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 29 16:21:13 2020 +0200
Make the indexer storage write to the journal.
commit 8272bc90a367f2d9a9eb231505ad3ccc126c714f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 5 13:36:44 2020 +0100
test_origin_intrinsic_metadata_add__deadlock: use more values, to make the test less likely to unexpectedly pass.
commit 5a5af91ac5aee172ff58e68f8b2121de179635e9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 5 13:37:43 2020 +0100
test_origin_intrinsic_metadata_add__deadlock: Fix failure on nondeterministic order
postgresql kindly returns the results in the order the test expected... most of the time.See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/115/ for more details.
Build is green
Patch application report for D4083 (id=15661)
Could not rebase; Attempt merge onto e2835bfff6...
Updating e2835bf..1fd7ae9 Fast-forward swh/indexer/storage/__init__.py | 26 ++++++++++++- swh/indexer/storage/db.py | 12 ++++++ swh/indexer/storage/in_memory.py | 39 +++++++++++++------ swh/indexer/storage/model.py | 13 +++++++ swh/indexer/storage/writer.py | 56 ++++++++++++++++++++++++++++ swh/indexer/tests/storage/conftest.py | 6 ++- swh/indexer/tests/storage/test_api_client.py | 29 +++++++++++--- swh/indexer/tests/storage/test_in_memory.py | 2 +- swh/indexer/tests/storage/test_storage.py | 41 ++++++++++++++------ 9 files changed, 192 insertions(+), 32 deletions(-) create mode 100644 swh/indexer/storage/writer.py
Changes applied before test
commit 1fd7ae9261e8c6dce5a9cfbfa6c2bc758b833b22
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 29 16:21:13 2020 +0200
Make the indexer storage write to the journal.
commit 8272bc90a367f2d9a9eb231505ad3ccc126c714f
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 5 13:36:44 2020 +0100
test_origin_intrinsic_metadata_add__deadlock: use more values, to make the test less likely to unexpectedly pass.
commit 5a5af91ac5aee172ff58e68f8b2121de179635e9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Thu Nov 5 13:37:43 2020 +0100
test_origin_intrinsic_metadata_add__deadlock: Fix failure on nondeterministic order
postgresql kindly returns the results in the order the test expected... most of the time.See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/116/ for more details.
overall I'm ok, but I find it really lacks some documentation/explanations of how this works, especially the JournalWriter collaborator object
| swh/indexer/storage/__init__.py | ||
|---|---|---|
| 128 | I know this docstring was outdated before this diff, but maybe it could be updated as part of it. | |
Build is green
Patch application report for D4083 (id=15797)
Rebasing onto 8272bc90a3...
Current branch diff-target is up to date.
Changes applied before test
commit 340c73a19604273467804c7c54b217b780a1677a
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date: Tue Sep 29 16:21:13 2020 +0200
Make the indexer storage write to the journal.See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/117/ for more details.
| swh/indexer/storage/writer.py | ||
|---|---|---|
| 64 | This will flush the journal writer on every message. Please avoid doing that. | |