Page MenuHomeSoftware Heritage

Add a test data generator module
ClosedPublic

Authored by douardda on Oct 29 2019, 2:37 PM.

Details

Summary

currently provides mainly 2 generators:

  • gen_origins()
  • gen_contents()

Depends on D2190

Diff Detail

Repository
rDMOD Data model
Branch
master
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 8684
Build 12643: tox-on-jenkinsJenkins
Build 12642: arc lint + arc unit

Event Timeline

vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/model/tests/generate_testdata.py
21

the data model accepts any signed 64-bit number as timestamp

swh/model/tests/test_generate_testdata.py
33–35

The test should check url unicity. Currently, you test only checks unicity of (type, url).

49–54

This set should check unicity of each of the hashes.

This revision now requires changes to proceed.Oct 29 2019, 2:45 PM

rebase + add missing dep on pytz

swh/model/tests/generate_testdata.py
21

sure but do we need such a wide space?

swh/model/tests/test_generate_testdata.py
33–35

right, type is still around!

update test_gen_origins_x accorgin to vlorentz comment

vlorentz added inline comments.
swh/model/tests/generate_testdata.py
21

Yes, because we want tests to check that negative and big values are supported too.

swh/model/tests/test_generate_testdata.py
33–35

models = {Origin.from_dict(d).url for d in origins}

This revision now requires changes to proceed.Oct 29 2019, 4:18 PM
swh/model/tests/generate_testdata.py
21

IMHO this is something to test specifically, not that useful for 'general-purpose' generated datasets.

If we want/need to "check that negative and big values are supported", we need specifically crafted tests, not rely on random being nice with us.

Also, notice that currently hypothesis_stategies.contents() do not generate ctime field at all.

swh/model/tests/test_generate_testdata.py
33–35

makes sense, thx

simplify a bit a couple of tests

as pointed by vlorentz

swh/model/tests/generate_testdata.py
21

The same could be said of using different PROTOCOLS, having an IRI in DOMAINS, having # and ? in PATHS, having different CONTENT_STATUSes, etc.

If it's cheap to explore a large range of values, why not do it?

requirements.txt
8

should be in requirements-tests.txt

Oh and also, it would be nice of these gen_ functions to return model objects instead of dicts (as we want to get rid of these dicts as much as possible)

This revision now requires changes to proceed.Oct 29 2019, 4:39 PM

Oh and also, it would be nice of these gen_ functions to return model objects instead of dicts (as we want to get rid of these dicts as much as possible)

Maybe, not sure we really want this (right now at least).

swh/model/tests/generate_testdata.py
21

possibly but I want to be able to fix swh.storage (!)

move pyts dep in -tests req

Updating D2191: Add a test data generator module

This revision is now accepted and ready to land.Oct 29 2019, 5:10 PM
This revision was automatically updated to reflect the committed changes.