Page MenuHomeSoftware Heritage

Add a test data generator module
ClosedPublic

Authored by douardda on Oct 29 2019, 2:37 PM.

Details

Summary

currently provides mainly 2 generators:

  • gen_origins()
  • gen_contents()

Depends on D2190

Diff Detail

Repository
rDMOD Data model
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

douardda created this revision.Oct 29 2019, 2:37 PM
vlorentz requested changes to this revision.Oct 29 2019, 2:45 PM
vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/model/tests/generate_testdata.py
22

the data model accepts any signed 64-bit number as timestamp

swh/model/tests/test_generate_testdata.py
34–36

The test should check url unicity. Currently, you test only checks unicity of (type, url).

50–55

This set should check unicity of each of the hashes.

This revision now requires changes to proceed.Oct 29 2019, 2:45 PM
douardda updated this revision to Diff 7469.Oct 29 2019, 2:48 PM

rebase + add missing dep on pytz

douardda added inline comments.Oct 29 2019, 2:52 PM
swh/model/tests/generate_testdata.py
22

sure but do we need such a wide space?

swh/model/tests/test_generate_testdata.py
34–36

right, type is still around!

douardda updated this revision to Diff 7470.Oct 29 2019, 2:56 PM

update test_gen_origins_x accorgin to vlorentz comment

vlorentz requested changes to this revision.Oct 29 2019, 4:18 PM
vlorentz added inline comments.
swh/model/tests/generate_testdata.py
22

Yes, because we want tests to check that negative and big values are supported too.

swh/model/tests/test_generate_testdata.py
34–36

models = {Origin.from_dict(d).url for d in origins}

This revision now requires changes to proceed.Oct 29 2019, 4:18 PM
douardda added inline comments.Oct 29 2019, 4:27 PM
swh/model/tests/generate_testdata.py
22

IMHO this is something to test specifically, not that useful for 'general-purpose' generated datasets.

If we want/need to "check that negative and big values are supported", we need specifically crafted tests, not rely on random being nice with us.

Also, notice that currently hypothesis_stategies.contents() do not generate ctime field at all.

douardda added inline comments.Oct 29 2019, 4:28 PM
swh/model/tests/test_generate_testdata.py
34–36

makes sense, thx

douardda updated this revision to Diff 7478.Oct 29 2019, 4:30 PM

simplify a bit a couple of tests

as pointed by vlorentz

vlorentz added inline comments.Oct 29 2019, 4:34 PM
swh/model/tests/generate_testdata.py
22

The same could be said of using different PROTOCOLS, having an IRI in DOMAINS, having # and ? in PATHS, having different CONTENT_STATUSes, etc.

If it's cheap to explore a large range of values, why not do it?

vlorentz added inline comments.Oct 29 2019, 4:37 PM
requirements.txt
8 ↗(On Diff #7478)

should be in requirements-tests.txt

vlorentz requested changes to this revision.Oct 29 2019, 4:39 PM

Oh and also, it would be nice of these gen_ functions to return model objects instead of dicts (as we want to get rid of these dicts as much as possible)

This revision now requires changes to proceed.Oct 29 2019, 4:39 PM

Oh and also, it would be nice of these gen_ functions to return model objects instead of dicts (as we want to get rid of these dicts as much as possible)

Maybe, not sure we really want this (right now at least).

swh/model/tests/generate_testdata.py
22

possibly but I want to be able to fix swh.storage (!)

douardda updated this revision to Diff 7482.Oct 29 2019, 4:52 PM

move pyts dep in -tests req

Updating D2191: Add a test data generator module

vlorentz accepted this revision.Oct 29 2019, 5:10 PM
This revision is now accepted and ready to land.Oct 29 2019, 5:10 PM
This revision was automatically updated to reflect the committed changes.