Page MenuHomeSoftware Heritage

Use the in-mem storage instead of mocks to test the git loader.
ClosedPublic

Authored by vlorentz on Nov 20 2018, 3:59 PM.

Diff Detail

Repository
rDLDG Git loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vlorentz retitled this revision from Use the in-mem storage instead of mocks. to Use the in-mem storage instead of mocks to test the git loader..
vlorentz added inline comments.
swh/loader/git/tests/test_loader.py
175

Note: that's an unrelated change.

ardumont added a subscriber: ardumont.
ardumont added inline comments.
swh/loader/git/tests/test_loader.py
116

I do not like doing this...

The GitLoader and the BulkUpdater allows to pass the config dict instead (do not like it either [1] but it's better than mutating the loader instance).

That would also avoid the side effect of loading the configuration file from disk...

[1] because it's been opened solely for the tests...

175

CONTENT1 is probably badly named then since i see that it's not one content but many ;)

This revision now requires changes to proceed.Nov 20 2018, 5:25 PM

Restarted the build.
BUILD has failed

Now that looks like real tests failure.

vlorentz added inline comments.
swh/loader/git/tests/test_loader.py
116

Agreed. As we talked earlier today, this will need a change to swh.storage.get_storage (D689).

175

It's the contents, at "version 1" of the DB (version 0 = before the loader runs, version 1 = after it runs for the first time, version 2 = after it runs for the second time).

swh/loader/git/tests/test_loader.py
175

CONTENTS_v{0,1,2} etc... would have been more appropriate...
Let's keep this on the side for now.

  • Override the loader's config to avoid initializing a remote storage.
  • Actually override the storage config.

I'm supposing the build failure is due to not yet integrated dependent diff.

swh/loader/git/tests/test_loader.py
109

Do you need to?
That's the part reading the disk for configuration file.

So, that could create divergence on dev box if they have some configuration files in the right places ({/etc/softwareheritage/,~/.swh,~/.config/swh}/loader/git-{loader,updater}.yml))

vlorentz added inline comments.
swh/loader/git/tests/test_loader.py
109

I need a way to obtain content_packet_size and its many friends.

swh/loader/git/tests/test_loader.py
109

In other modules, we duplicate (for now) that information in the TestLoader (TestGitLoader for example).

Side-effects/configuration duplication trade-offs.

  • Fix arguments to __init__().
  • Don't read config files from the tests.
This revision is now accepted and ready to land.Nov 27 2018, 7:10 PM
anlambert added inline comments.
swh/loader/git/tests/test_loader.py
107

Instead of duplicating base configuration, I would rather use this implementation (see D768)

def parse_config_file(self, *args, **kwargs):
        config = super().parse_config_file(*args, **kwargs)
        config['storage'] = {'cls': 'memory', 'args': {}}
        return config
swh/loader/git/tests/test_loader.py
107

I'm the source of disagreement here.
I'd like to avoid the call to the original parse_config_file method here to reduce the disk reading part.
(I've said as much in D768 indeed).

If you take a look at the discussion just below this remark, you'll see the detail ;)

This revision was automatically updated to reflect the committed changes.