Details
- Reviewers
ardumont - Group Reviewers
Reviewers - Maniphest Tasks
- T1307: Remove mock storages used in tests.
T1432: Remove mock storages from the indexers - Commits
- rDCIDXfb34e1aabb2a: rm ctags mocks + add ctags to idx db + fix doc.
Diff Detail
- Repository
- rDCIDX Metadata indexer
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/112/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/112/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/114/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/114/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/118/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/118/console
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/122/ for more details.
swh/indexer/storage/in_memory.py | ||
---|---|---|
169 | This is not pointless. So in the indexer storage, the function that add those data simply ignore the conflicted data (which should be exactly the same as before). In the end, only read operations are expected when we pass yet again on the same content. Why were we expected to pass on the same content, you might ask? As it's an implementation detail, in theory, you could implement this as you wish here as long as tests are fine ;) |
swh/indexer/storage/in_memory.py | ||
---|---|---|
169 |
ctags implementations are registered as tools, and rows from different tools do not conflict with each other: create unique index on content_ctags(id, hash_sha1(name), kind, line, lang, indexer_configuration_id); |
swh/indexer/storage/in_memory.py | ||
---|---|---|
169 |
Yes, i did that. What i meant was for the case same tool, same content, the computed data is the same. The also supposedly gain here is that there is no writes operation with this approach. So it's supposedly faster (we'd need metric to ensure that ;). Against what you proposed which would always write. Hoping this is clearer. |
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/124/ for more details.
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/132/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/132/console
This is not pointless.
This is an implementation detail from the indexer storage.
I expected the multiple ctags implementations (universal, exuberant, etc...) to be idempotent in their computations (still do).
So in the indexer storage, the function that add those data simply ignore the conflicted data (which should be exactly the same as before). In the end, only read operations are expected when we pass yet again on the same content.
Why were we expected to pass on the same content, you might ask?
Because not so long ago, the indexers were a pipeline. Thus, adding a new indexer would have triggered such behavior.
Because orchestrator would have broadcast yet again same contents to the new and possibly the other indexers as we..
As it's an implementation detail, in theory, you could implement this as you wish here as long as tests are fine ;)