Updating d7c16735..c2d10c90
Fast-forward
 swh/web/tests/api/test_apiresponse.py          |   4 +-
 swh/web/tests/api/test_utils.py                |  17 +-
 swh/web/tests/api/views/test_content.py        |  12 -
 swh/web/tests/api/views/test_directory.py      |   7 -
 swh/web/tests/api/views/test_graph.py          |   6 +-
 swh/web/tests/api/views/test_identifiers.py    |  10 +-
 swh/web/tests/api/views/test_origin.py         |   5 +-
 swh/web/tests/api/views/test_release.py        |   4 +-
 swh/web/tests/api/views/test_revision.py       |   4 +-
 swh/web/tests/api/views/test_snapshot.py       |  13 +-
 swh/web/tests/api/views/test_vault.py          |  17 +-
 swh/web/tests/browse/test_snapshot_context.py  |  37 ++-
 swh/web/tests/browse/views/test_content.py     | 190 +++++++--------
 swh/web/tests/browse/views/test_directory.py   |  67 +++---
 swh/web/tests/browse/views/test_identifiers.py |  14 +-
 swh/web/tests/browse/views/test_origin.py      |  86 +++----
 swh/web/tests/browse/views/test_release.py     |  14 +-
 swh/web/tests/browse/views/test_revision.py    |   5 +-
 swh/web/tests/common/test_archive.py           |  47 +---
 swh/web/tests/common/test_identifiers.py       | 113 +++++----
 swh/web/tests/common/test_origin_visits.py     |  21 +-
 swh/web/tests/conftest.py                      | 307 ++++++++++++++++++++++++-
 swh/web/tests/misc/test_badges.py              |  10 +-
 swh/web/tests/misc/test_iframe.py              |  12 +-
 swh/web/tests/strategies.py                    | 265 +--------------------
 25 files changed, 601 insertions(+), 686 deletions(-)

Changes applied before test

commit c2d10c90c3bd6f5fa3634fa456475c3fa92ae194
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Oct 12 13:48:57 2021 +0200

    tests: Turn origin* hypothesis strategies into pytest fixtures
    
    Related to T3603

commit 81486f517526f7cb94e0ca2ed49433ebd7c3d1b3
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Tue Oct 12 11:45:43 2021 +0200

    tests: Turn directory* hypothesis strategies into pytest fixtures
    
    Related to T3603

commit 23a8dc33453fdcf3fb0037ba1c0c76752d40c1c4
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Mon Oct 11 15:17:58 2021 +0200

    tests: Turn content* hypothesis strategies into pytest fixtures
    
    Using such hypothesis strategies have several drawbacks so let's
    turn them into pytest fixtures as it feels a better way to do so.
    
    Related to T3603

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1164/ for more details.

Harbormaster completed remote builds in B24383: Diff 23466.Oct 12 2021, 2:04 PM

anlambert requested review of this revision.Oct 12 2021, 2:04 PM

anlambert added a child revision: D6459: tests: Turn release* hypothesis strategies into pytest fixtures.Oct 12 2021, 2:09 PM

Thanks for working on reducing the number of hypothesis fixtures!

I'm a bit concerned about the reproducibility of test results, given fixtures that pull random list elements, with no control on the sequence of test executions and on the seed of the python random module when the fixture is called. (Now that I've looked at swh.web.tests.data, I'm even more concerned :-))

I don't have an answer about "what to use?", unfortunately, except just going for exhaustive tests (i.e. running the test functions for *all* values of the origins in the test data set), which doesn't sound very compelling unless the size of the sample dataset is small, which doesn't look to be the case.

https://github.com/pytest-dev/pytest/issues/5463 has some background about concerns with respect to random seeding in tests.

Apart from that, I see that some of the function-level fixtures are doing ""heavy"" querying on the test data for information that is, in effect, static (e.g. the list of origins with more than two visits, etc.). I wonder if it would be possible to extract this logic to only run it once on initialization of the test data?

I initially wrote: we may want to initialize a single, module scoped seed_storage fixture with all data inserted, and make the storage fixture used by tests a function-scoped fixture which would clone this seed storage instance - I assume some tests have to *write* to the storage, so you can't just have one global read only storage fixture - but I now see that's what swh.web.tests.data does. Maybe _init_tests_data could be turned into that seed_storage module-scoped pytest fixture, instead of the current ad-hoc logic? This would also help us control the random seed used for generating the test data (allowing us to override it to reproduce test results)?

@olasd Could you open a task, so anlambert can land this stack of diffs now before we discuss the next step?

In D6458#167702, @olasd wrote:

Thanks for working on reducing the number of hypothesis fixtures!

I'm a bit concerned about the reproducibility of test results, given fixtures that pull random list elements, with no control on the sequence of test executions and on the seed of the python random module when the fixture is called. (Now that I've looked at swh.web.tests.data, I'm even more concerned :-))

I don't have an answer about "what to use?", unfortunately, except just going for exhaustive tests (i.e. running the test functions for *all* values of the origins in the test data set), which doesn't sound very compelling unless the size of the sample dataset is small, which doesn't look to be the case.

https://github.com/pytest-dev/pytest/issues/5463 has some background about concerns with respect to random seeding in tests.

Apart from that, I see that some of the function-level fixtures are doing ""heavy"" querying on the test data for information that is, in effect, static (e.g. the list of origins with more than two visits, etc.). I wonder if it would be possible to extract this logic to only run it once on initialization of the test data?

I initially wrote: we may want to initialize a single, module scoped seed_storage fixture with all data inserted, and make the storage fixture used by tests a function-scoped fixture which would clone this seed storage instance - I assume some tests have to *write* to the storage, so you can't just have one global read only storage fixture - but I now see that's what swh.web.tests.data does. Maybe _init_tests_data could be turned into that seed_storage module-scoped pytest fixture, instead of the current ad-hoc logic? This would also help us control the random seed used for generating the test data (allowing us to override it to reproduce test results)?

Thanks for pointing that out. I was also wondering how to make failing tests reproducible when random input data are involved.

I think one possible solution would be to:

define a function scoped fixture that set the random seed to current time before test execution and reset random state on test teardown
implement custom pytest reporting for displaying the seed used for each failing test
add a new pytest option to explicitely set the random seed from CLI in order to reproduce any failing test

Regarding static tests data, we could use functools.lru_cache to avoid executing code returning the same result multiple times.

I will handle these improvements in other diffs, meanwhile we can land this pile of diffs as the previous tests behavior remain unchanged.

In D6458#167737, @vlorentz wrote:

@olasd Could you open a task, so anlambert can land this stack of diffs now before we discuss the next step?

Yeah, sure, I don't have a problem with that.

In D6458#167771, @olasd wrote:

Yeah, sure, I don't have a problem with that.

(That is, I don't have a problem with these changes landing first, as long as we make sure that eventually we have a proper way of reproducing test failures that have come out of "random" fixtures.)

In D6458#167772, @olasd wrote:

In D6458#167771, @olasd wrote:

Yeah, sure, I don't have a problem with that.

(That is, I don't have a problem with these changes landing first, as long as we make sure that eventually we have a proper way of reproducing test failures that have come out of "random" fixtures.)

I might have a proper solution to reproduce test failures involving random fixtures. I am going to land that pile of diffs and submit a new one for reproducibility afterwards.

In D6458#167831, @anlambert wrote:

In D6458#167772, @olasd wrote:

In D6458#167771, @olasd wrote:

Yeah, sure, I don't have a problem with that.

(That is, I don't have a problem with these changes landing first, as long as we make sure that eventually we have a proper way of reproducing test failures that have come out of "random" fixtures.)

I might have a proper solution to reproduce test failures involving random fixtures. I am going to land that pile of diffs and submit a new one for reproducibility afterwards.