Page MenuHomeSoftware Heritage

tests: Use production backends within the indexer tests
ClosedPublic

Authored by ardumont on Dec 1 2020, 3:45 PM.

Details

Summary

This detected some paper cuts within the db.py module (through some cli tests
no longer using the memory storage).

The main goal is to decrease friction when actually deploying indexer related
services (backend, indexers, ...).

The pg backend tests should still be reasonably fast as it's using the
swh.core.db.pytest_plugin (which truncate tables in between tests).

Indeed, a rapid look in the jenkins diff ui [1] shows that the overall execution
time is reasonably as fast as before.

One of the next steps would be to improve the current journal client tests
in the indexer [2]

Related to T2821

[1] https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/

[2] D4640

Test Plan

tox

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D4638 (id=16459)

Rebasing onto 28ae49da34...

Current branch diff-target is up to date.
Changes applied before test
commit 394576b95688b867a86a5bcb4da30534640a0577
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Dec 1 15:40:01 2020 +0100

    tests: Use production backends within the indexer tests
    
    This detected some paper cuts within cli tests for example.
    
    The main goal is to decrease friction when actually deploying indexer related
    services (backend, indexers, ...).
    
    The pg backends tests should still be reasonably fast as it's using the
    swh.core.db.pytest_plugin (which truncate tables in between tests).
    
    Related to T2821

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/131/ for more details.

This looks alright, thanks.

The fixture stacking between indexer_config and storage/scheduler/objstorage feels a bit backwards, but I can understand why you'd want to keep the full configuration in a single fixture instead of breaking it down into a bunch of them.

I've made a few comments inline about stuff I'm not sure about.

conftest.py
19–25

the comment needs an update

swh/indexer/storage/db.py
490–493

I believe you could use list unconditionally here.

swh/indexer/tests/test_cli.py
75–86

Does this really need to be a fixture now?

swh/indexer/tests/test_origin_head.py
89–90

spurious comma

119–120

same pesky comma

swh/indexer/tests/test_origin_metadata.py
29

There's a plural/singular inconsistency here; is the name of the config entry buggy?

I get why you want to use pg in tests to be closer to the prod, but it makes them noticeably slower; that's why we didn't use pg in the first place.

Indeed, a rapid look in the jenkins diff ui [1] shows that the overall execution
time is reasonably as fast as before.

alright then

This revision is now accepted and ready to land.Dec 2 2020, 5:20 PM
ardumont added inline comments.
swh/indexer/tests/test_cli.py
75–86

Well, i just find this clearer this way.
It also matches what's done in other swh modules for the cli tests.

I guess we could also instantiate it in each test but if you don't push me much more than that, i'll keep it as is :)

swh/indexer/tests/test_origin_metadata.py
29

yes, i recall so, we can have one tool or more.

ardumont marked an inline comment as done.

Adapt according to review:

  • adapt one comment
  • drop spurious commas
  • use list unconditionally
ardumont added inline comments.
swh/indexer/tests/test_origin_metadata.py
29

... but that's named "tools" either way.

Build is green

Patch application report for D4638 (id=16502)

Rebasing onto 28ae49da34...

Current branch diff-target is up to date.
Changes applied before test
commit 2e73f3be6a1cc9d66c88a37e817e042aa60db0f5
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Dec 1 15:40:01 2020 +0100

    tests: Use production backends within the indexer tests
    
    This detected some paper cuts within cli tests for example.
    
    The main goal is to decrease friction when actually deploying indexer related
    services (backend, indexers, ...).
    
    The pg backends tests should still be reasonably fast as it's using the
    swh.core.db.pytest_plugin (which truncate tables in between tests).
    
    Related to T2821

See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/134/ for more details.