Page MenuHomeSoftware Heritage

Add module to generate relevant data as tests input
ClosedPublic

Authored by anlambert on Dec 14 2018, 6:15 PM.

Details

Summary

First diff exposing the work I have done so far on improving swh-web tests.
This one is about the tests data generation by populating an in-memory archive.

In order to avoid harcoding tests input data and get closer to real world ones,
populate a test archive by loading in it a couple of lightweight git repositories.

The ids of the objects in this test archive (contents, directories, revisions, ...)
will then be provided as tests input in order to retrieve their associated data
from the in-memory storages. Proceeding like this will allow us to remove a
lot of mocks in the tests implementation.

Related T1271

Diff Detail

Repository
rDWAPPS Web applications
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

anlambert created this revision.Dec 14 2018, 6:15 PM
vlorentz requested changes to this revision.Dec 14 2018, 6:57 PM
vlorentz added a subscriber: vlorentz.

The two zips you add are about 200KB in size. Could you find smaller ones?

swh/web/tests/data.py
156

storage is an instance of swh.storage.in_memory.Storage, it shouldn't be in a config dict.

157

That's a pretty tricky side-effect. It should be documented and explained, both here and in swh.web.common.service.

199–223

This can be factorized:

indexers = {}
for idx_name, idx_class in (('mimetype', _MimetypeIndexer), ('language',_LanguageIndexer), ('license', _FossologyLicenseIndexer), ('ctags', _CtagsIndexer)):
    idx = idx_class()
    idx.storage = storage
    idx.objstorage = storage.objstorage
    idx.idx_storage = idx_storage
    indexers[idx_name] = idx

Then use **indexers in the returned dict

This revision now requires changes to proceed.Dec 14 2018, 6:57 PM

The two zips you add are about 200KB in size. Could you find smaller ones?

That's not really big either ... Those repos enables me to capture a lot of test cases (notably ctags, releases, non linear revision history)
so I would prefer to continue using them.

That sounds pretty neat for the next steps!
\m/

swh/web/tests/data.py
157

That's the only way to share the in-memory storage consistently i think.
So yes, explaining why we do that would be great.

vlorentz added inline comments.Dec 15 2018, 10:20 AM
swh/web/tests/data.py
157

That's the only way to share the in-memory storage consistently i think.

You can use unittest.mock.patch('swh.storage.in_memory.Storage'): https://forge.softwareheritage.org/source/swh-indexer/browse/master/swh/indexer/tests/test_origin_metadata.py$98

anlambert retitled this revision from swh-web: Add module to generate relevant data as tests input to Add module to generate relevant data as tests input.Dec 17 2018, 10:42 AM
anlambert marked 3 inline comments as done.Dec 17 2018, 11:17 AM
anlambert added inline comments.
swh/web/tests/data.py
157

The idea here is to patch globally the storage instances in order to avoid using decorators in all tests to do so.

But I agree this operation should not be performed in that module, who should only be dedicated to the generation
of tests data. I will move it to the swh.web.tests.testcase module.

199–223

Indeed, thanks.

anlambert updated this revision to Diff 2669.Dec 17 2018, 4:34 PM

Update:

  • address vlorentz comments
  • move storage patching out of this module
  • bump swh-loader-git version
vlorentz accepted this revision.Dec 17 2018, 5:02 PM
This revision is now accepted and ready to land.Dec 17 2018, 5:02 PM
This revision was automatically updated to reflect the committed changes.