Page MenuHomeSoftware Heritage

Add a test data generator module
ClosedPublic

Authored by douardda on Tue, Oct 29, 2:37 PM.

Details

Summary

currently provides mainly 2 generators:

  • gen_origins()
  • gen_contents()

Depends on D2190

Diff Detail

Repository
rDMOD Data Model
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

douardda created this revision.Tue, Oct 29, 2:37 PM
vlorentz requested changes to this revision.Tue, Oct 29, 2:45 PM
vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/model/tests/generate_testdata.py
22

the data model accepts any signed 64-bit number as timestamp

swh/model/tests/test_generate_testdata.py
34–36

The test should check url unicity. Currently, you test only checks unicity of (type, url).

50–55

This set should check unicity of each of the hashes.

This revision now requires changes to proceed.Tue, Oct 29, 2:45 PM
douardda updated this revision to Diff 7469.Tue, Oct 29, 2:48 PM

rebase + add missing dep on pytz

douardda added inline comments.Tue, Oct 29, 2:52 PM
swh/model/tests/generate_testdata.py
22

sure but do we need such a wide space?

swh/model/tests/test_generate_testdata.py
34–36

right, type is still around!

douardda updated this revision to Diff 7470.Tue, Oct 29, 2:56 PM

update test_gen_origins_x accorgin to vlorentz comment

vlorentz requested changes to this revision.Tue, Oct 29, 4:18 PM
vlorentz added inline comments.
swh/model/tests/generate_testdata.py
22

Yes, because we want tests to check that negative and big values are supported too.

swh/model/tests/test_generate_testdata.py
34–36

models = {Origin.from_dict(d).url for d in origins}

This revision now requires changes to proceed.Tue, Oct 29, 4:18 PM
douardda added inline comments.Tue, Oct 29, 4:27 PM
swh/model/tests/generate_testdata.py
22

IMHO this is something to test specifically, not that useful for 'general-purpose' generated datasets.

If we want/need to "check that negative and big values are supported", we need specifically crafted tests, not rely on random being nice with us.

Also, notice that currently hypothesis_stategies.contents() do not generate ctime field at all.

douardda added inline comments.Tue, Oct 29, 4:28 PM
swh/model/tests/test_generate_testdata.py
34–36

makes sense, thx

douardda updated this revision to Diff 7478.Tue, Oct 29, 4:30 PM

simplify a bit a couple of tests

as pointed by vlorentz

vlorentz added inline comments.Tue, Oct 29, 4:34 PM
swh/model/tests/generate_testdata.py
22

The same could be said of using different PROTOCOLS, having an IRI in DOMAINS, having # and ? in PATHS, having different CONTENT_STATUSes, etc.

If it's cheap to explore a large range of values, why not do it?

vlorentz added inline comments.Tue, Oct 29, 4:37 PM
requirements.txt
8 ↗(On Diff #7478)

should be in requirements-tests.txt

vlorentz requested changes to this revision.Tue, Oct 29, 4:39 PM

Oh and also, it would be nice of these gen_ functions to return model objects instead of dicts (as we want to get rid of these dicts as much as possible)

This revision now requires changes to proceed.Tue, Oct 29, 4:39 PM

Oh and also, it would be nice of these gen_ functions to return model objects instead of dicts (as we want to get rid of these dicts as much as possible)

Maybe, not sure we really want this (right now at least).

swh/model/tests/generate_testdata.py
22

possibly but I want to be able to fix swh.storage (!)

douardda updated this revision to Diff 7482.Tue, Oct 29, 4:52 PM

move pyts dep in -tests req

Updating D2191: Add a test data generator module

vlorentz accepted this revision.Tue, Oct 29, 5:10 PM
This revision is now accepted and ready to land.Tue, Oct 29, 5:10 PM
This revision was automatically updated to reflect the committed changes.