Page MenuHomeSoftware Heritage

"text"-indexers: Migrate to partition index instead of range
ClosedPublic

Authored by ardumont on Aug 6 2020, 9:50 AM.

Details

Summary

Deprecated storage.content-get-range (cassandra storage unsupported) got dropped in the storage
in favor of a more compliant api storage.content-get-partition [1] (cassandra storage supported)

The text indexers were the sole users of that deprecated api.
This migrates them to move to the same pattern of using a partition.
This simplifies the setup as no range is to be computed (the api does it \o/).

The production impact is to stop current indexers, disable all their current tasks from the scheduler and change their input in the scheduler db.

Note that it's wip as it remains tests using the old interface to fix.

Also, this fixes:

  • mistyped codes following the migration to storage 0.12.0. This cannot be untangled from the partition migration though.
  • build [2]

[1] D3712 D3713

[2] https://jenkins.softwareheritage.org/job/DCIDX/job/tests/1084/console

Related to T645

Test Plan

tox

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 14348
Build 22061: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 22060: arc lint + arc unit

Unit TestsFailed

TimeTest
4 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.indexer.tests.test_fossology_license.TestFossologyLicensePartitionIndexer::test__index_contents
self = <swh.indexer.tests.test_fossology_license.TestFossologyLicensePartitionIndexer testMethod=test__index_contents> def test__index_contents(self):
4 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.indexer.tests.test_fossology_license.TestFossologyLicensePartitionIndexer::test__index_contents_with_indexed_data
self = <swh.indexer.tests.test_fossology_license.TestFossologyLicensePartitionIndexer testMethod=test__index_contents_with_indexed_data> def test__index_contents_with_indexed_data(self):
4 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.indexer.tests.test_mimetype.TestMimetypePartitionIndexer::test__index_contents
self = <swh.indexer.tests.test_mimetype.TestMimetypePartitionIndexer testMethod=test__index_contents> def test__index_contents(self):
4 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.indexer.tests.test_mimetype.TestMimetypePartitionIndexer::test__index_contents_with_indexed_data
self = <swh.indexer.tests.test_mimetype.TestMimetypePartitionIndexer testMethod=test__index_contents_with_indexed_data> def test__index_contents_with_indexed_data(self):
3 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.indexer.codemeta::swh.indexer.codemeta.merge_values
View Full Test Results (4 Failed · 310 Passed · 15 Skipped)

Event Timeline

Build has FAILED

Patch application report for D3718 (id=13098)

Rebasing onto 62d73ed83d...

Current branch diff-target is up to date.
Changes applied before test
commit 465fbe023f0751ef666384f025e5772a06b771b1
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Aug 6 09:48:50 2020 +0200

    textual-indexers: Migrate to partition index instead of range
    
    This also fixes mistyped codes following the migration to storage 0.12.0. This
    cannot be untangled from the partition migration though.
    
    Related to T645

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/30/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/30/console

ardumont retitled this revision from wip: textual-indexers: Migrate to partition index instead of range to wip: "text"-indexers: Migrate to partition index instead of range.Aug 6 2020, 10:03 AM
ardumont edited the summary of this revision. (Show Details)
ardumont added inline comments.
swh/indexer/indexer.py
286

on indexer partition

376

to fix... PagedResult[Sha1]

455

doc to fix

478

doc to fix.

swh/indexer/mimetype.py
147

doc fix needed.

swh/indexer/storage/__init__.py
178

here because exposing it through the interface did not work for some reason.

swh/indexer/storage/in_memory.py
106

doc fix needed.

124

line.

swh/indexer/storage/interface.py
51

doc fix.

291

doc fix.

305

line.

swh/indexer/tests/storage/test_storage.py
1038

@vlorentz and now we are really using opaque token id! ;)

(mentioning it as our first discussion about those started in the indexer a long time ago, and i did not grasp completely what you said at the time, now i understand ;)

Fix multiple docstring issues

ardumont planned changes to this revision.EditedAug 6 2020, 10:23 AM

still wip (still 4 tests to fix, they are using the old apis, need to migrate them).

Build has FAILED

Patch application report for D3718 (id=13100)

Rebasing onto 62d73ed83d...

Current branch diff-target is up to date.
Changes applied before test
commit a04ebc7253d925887a6c80b77f3a07c169c4815e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Aug 6 09:48:50 2020 +0200

    textual-indexers: Migrate to partition index instead of range
    
    This also fixes mistyped codes following the migration to storage 0.12.0. This
    cannot be untangled from the partition migration though.
    
    Related to T645

Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/31/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/31/console

Build is green

Patch application report for D3718 (id=13108)

Rebasing onto 62d73ed83d...

Current branch diff-target is up to date.
Changes applied before test
commit 3aedf90c9cb9c4f5be4d25dde1e03456045c9128
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Aug 6 09:48:50 2020 +0200

    textual-indexers: Migrate to partition index instead of range
    
    This also fixes mistyped codes following the migration to storage 0.12.0. This
    cannot be untangled from the partition migration though.
    
    Related to T645

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/32/ for more details.

swh/indexer/storage/__init__.py
178

"here" as in the docstring is here.

ardumont retitled this revision from wip: "text"-indexers: Migrate to partition index instead of range to "text"-indexers: Migrate to partition index instead of range.Aug 6 2020, 11:32 AM
ardumont edited the summary of this revision. (Show Details)
This revision was not accepted when it landed; it landed in state Needs Review.Aug 6 2020, 1:08 PM
This revision was automatically updated to reflect the committed changes.