Page MenuHomeSoftware Heritage

Migrate tests to pytest and cleanup
ClosedPublic

Authored by anlambert on Nov 8 2019, 3:42 PM.

Details

Summary

Migrate all swh-web unit tests from unittest to pure pytest which implies the following:

  • all tests are turned from a class method to a simple function
  • replace previous assertions from unittest.TestCase to simple calls to Python assert
  • previous features offered by the former TestCase classes are turned into pytest fixtures
  • use pytest-mock fixture instead of decorator from unittest.patch
  • use requests_mock_datadir fixture from swh-core simplifying the setup of mocked HTTP requests
  • expose some Django assertions to be used with pytest
  • remove the swh.web.tests.testcase module

Some cleanups have also been performed on the tests implementation, notably:

  • Better formatting of string litterals
  • Removal of tests related to origin ids

WARNING: This diff is huge and most of its content is not really interesting to review as it
mainly consists of turning test methods to simple functions and replace self.assert* calls
by simple calls to Python assert.

I pushed it to validate the pytest fixtures use. The following ones are used in the tests:

  • client: Django test client fixture offered by pytest-django
  • api_client: Fixture to get the Django REST framework test client (defined in conftest.py)
  • test_archive: Fixture to manipulate the data from the sample archive used in the tests (defined in conftest.py). Previously, the features it offers were defined as methods in the former WebTestCase class.
  • mocker: Fixture wrapping the patching API provided by the unittest.mock module offered by pytest-mock

Kudos to the great unittest2pytest package as its use was a real time saver here when migrating the tests.

After that migration, the tests running time is decreased by a factor of 5-6x (around 35s when using unittest, around 7s when using pytest).

Related to T2062

Test Plan

tox

Diff Detail

Event Timeline

vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/web/tests/admin/test_origin_save.py
33

fixture names should be nouns, not verbs. eg. populated_db

swh/web/tests/api/test_apidoc.py
18–67

why is this changed?

swh/web/tests/api/views/test_content.py
112

assert 'Link' not in rv

136

same

swh/web/tests/browse/views/test_directory.py
92–131

what happened to these tests?

swh/web/tests/common/test_highlightjs.py
48–121

this could be a list of 2-tuples and a for loop.

swh/web/tests/common/test_throttling.py
71

is this used?

swh/web/tests/conftest.py
103–113

Why does this have both methods for the idx storage and the storage? I think it should be split in two.

238–240

Where is this used?

swh/web/tests/django_asserts.py
17–21

These names should probably be in snake_case

This revision now requires changes to proceed.Nov 13 2019, 4:11 PM

Thanks for the review @vlorentz. I will slightly rework the fixtures added so far and handle your other comments before updating that diff again.

swh/web/tests/admin/test_origin_save.py
33

ack

swh/web/tests/api/test_apidoc.py
18–67

I tried to better respect flake8 formatting when refactoring the tests.

Nevertheless, as the api endpoint docstrings are parsed by the swh.web.api.apidoc module
to produce HTML doc, some constraints have to be respected like not putting a line
break in :http:get:* roles. So the weird formatting sometimes.

I managed to improve that docstring in order to remove the #noqa marker, this will
be updated on next diff push.

swh/web/tests/browse/views/test_directory.py
92–131

They are still here, I simply moved private helper functions to the end of the file and put test functions at the beginning.

swh/web/tests/common/test_highlightjs.py
48–121

Ack, will update

swh/web/tests/conftest.py
103–113

Oh right, good idea.

The fixtures could then be named storage and idx_storage, it makes more sense.

238–240

Some tests do not take the test_archive fixture as parameter but still requires the test archive to be initialized to get expected data.

But I agree this calls for a cleaner way to initialize those tests data, I will try to improve that.

swh/web/tests/django_asserts.py
17–21

ack

Update: Adress @vlorentz comments and split the previous test_archive fixture into two:

  • storage_json: Wraps an in-memory storage and override some methods to get data in JSON serializable format
  • idx_storage_json: Wraps an in-memory indexer storage abnd override some methods to get data in JSON serializable format
swh/web/tests/common/test_throttling.py
71

Nope and it should have been removed before this refactoring, well seen

Update: Ensure both storages get overriden when initializing related fixtures

storage_json and idx_storage_json are too similar to the actual storage and idx_storage, even their names are. I find this very confusing.

I think that:

  1. they should be renamed to make it explicit they are test data, not storages.
  2. get methods should be removed, and replaced with dictionaries
  3. add methods on idx_storage_json should not have names like content_add_metadata, but rather run_metadata_idx, so it's clearer that they run indexers instead of just adding data passed to them as argument
This revision now requires changes to proceed.Nov 15 2019, 1:42 PM
  1. get methods should be removed, and replaced with dictionaries

What do you mean here ?

How about renaming the fixtures to archive_data and indexer_data ?

storage_json and idx_storage_json are too similar to the actual storage and idx_storage, even their names are. I find this very confusing.

The fact that those have the same interface than storages is totally intended here. Basically, it enables to inject data in the test archive
while running the tests and get the archive data in the same form they are displayed in HTML pages or api JSON responses.

Keeping the same method names is simpler from my point of view.

  1. get methods should be removed, and replaced with dictionaries

What do you mean here ?

I meant doing archive_data.contents[id] instead of archive_data.content_get(id)

storage_json and idx_storage_json are too similar to the actual storage and idx_storage, even their names are. I find this very confusing.

The fact that those have the same interface than storages is totally intended here. Basically, it enables to inject data in the test archive
while running the tests and get the archive data in the same form they are displayed in HTML pages or api JSON responses.

Keeping the same method names is simpler from my point of view.

Ok, now I get it. Just a rename to archive_data would be ok then.

But I'm still not a big fan of all the magic / global variables that's happening in those tests...

Update: Rename storage_json fixture to archive_data and idx_storage_json fixture to indexer_data

This revision is now accepted and ready to land.Nov 15 2019, 3:31 PM
This revision was automatically updated to reflect the committed changes.