Simpler code and less error-prone.
Details
Details
- Reviewers
ardumont - Group Reviewers
Reviewers - Commits
- rDCIDXcd42c667212a: indexer: Remove pagination logic using stream_results() instead.
Diff Detail
Diff Detail
- Repository
- rDCIDX Metadata indexer
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Comment Actions
Build is green
Patch application report for D4983 (id=17775)
Could not rebase; Attempt merge onto 3baf8bb919...
Updating 3baf8bb..cd42c66 Fast-forward swh/indexer/fossology_license.py | 21 ++++++++-------- swh/indexer/indexer.py | 46 +++++++++++++++++----------------- swh/indexer/mimetype.py | 23 +++++++++-------- swh/indexer/tests/test_indexer.py | 52 ++++++++++++++++++++++++++++++++++++--- 4 files changed, 94 insertions(+), 48 deletions(-)
Changes applied before test
commit cd42c667212a8a37a080fb3aed915ade93704ca4 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Feb 1 14:57:20 2021 +0100 indexer: Remove pagination logic using stream_results() instead. Simpler code and less error-prone. commit 4080b9ee931fe914a91addf1df2d160e56a2d8bb Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Mon Feb 1 14:41:23 2021 +0100 ContentPartitionIndexer: Do not index the same content multiple times at once. self._index_contents was called multiple times in a loop with the same arguments, except for the set of hashes to exclude. It means that, if there were N pages of hashes to exclude, each content was indexed N times; and the N-1 first iterations didn't even exclude all the hashes they had to exclude.
See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/144/ for more details.