Page MenuHomeSoftware Heritage

Objects that implicitely load configuration are a nightmare to test
Closed, MigratedEdits Locked

Description

A lot of the configuration in different objects is loaded using swh.core.config either implicitely (using CONFIG_BASE_FILENAME and the SWHConfig mixin) or explicitely (config.load_named_config) are very difficult to test.

Since they usually load their configuration and initialize their objects given this configuration in __init__(), you have to either:

  • mock.patch the config.load_named_config function
  • subclass and to override the __init__ method of the class
  • move the initialization behavior in another method
  • hope that the bogus default config (or any locally installed config) won't crash the initialization, and *then* mock the attributes to replace them with proper values

All of this is tied to the fact that classes shouldn't be allowed to load their config explicitely. The config should be loaded in the entry points of the program, and then passed around as state to the objects that need to be initialized.

Event Timeline

I disagree with the characterization of the default config as bogus. The default configuration is expected to work without change on a "developer" environment (e.g. point to a remote storage listening on localhost:5002, and to the "-dev" variants of any needed local databases).

The current "framework" for implicit loading of configuration was written this way because there is no hook to control when or where from a celery worker loads its configuration, as Task classes are loaded implicitly by the celery worker with a somewhat sanitized environment.

I agree that the implicit loading mechanism should be moved to the celery task classes rather than the base "worker" classes, which can be used and should be tested separately, with an instrumented configuration.

I think there's value in a class-based approach to managing config in code, as it would allow for subclassing (e.g. a specific lister subclasses its configuration from the base lister, adding its own attributes)

Pseudocodeish interface for a "Config" object that would allow doing so

class SWHConfig(dict):
    # XXX should get renamed as this attribute also contains type info
    DEFAULT_CONFIG = {}
    CONFIG_BASEPATH = None

    def __init__(self):
        super().__init__()
        self.init_from_defaults()
        self.loaded_from = []

    def get_default_config(self):
        """Get the default configuration; useful when extending a base config class"""
        return self.DEFAULT_CONFIG

    def get_config_basepath(self):
        """Get the base path from where we'll load configuration"""
        return self.CONFIG_BASEPATH

    def get_default_paths(self):
        """Get default configuration paths using get_config_basepath"""
        pass

    def check_config(self, config):
        """Typecheck config, try opening databases, etc."""
        return True

    def load_from_dict(self, config):
        """Load the configuration from an explicit dictionary"""
        self.check_config(config)
        self.update(config)
        self.loaded_from.append('<dict>')

    def load_from_file(self, file):
        """Load configuration from an explicit file"""
        config = read_config_file(file)
        self.check_config(config)
        self.update(config)
        self.loaded_from.append(file)

    def load_from_default_paths(self):
        """Load configuration from the system paths"""
        config = None
        for path in self.get_default_paths():
            # try loading config from path
            if successful:
                self.check_config(config)
                self.update(config)
                self.loaded_from.append(file)
                return
            if permission error:
                raise exception

We can then make sure worker classes (Loaders, Listers etc.) take a config parameter, which would be an instance of the config class. We can then adapt celery task classes to instantiate that config from default paths.

I disagree with the characterization of the default config as bogus. The default configuration is expected to work without change on a "developer" environment (e.g. point to a remote storage listening on localhost:5002, and to the "-dev" variants of any needed local databases).

My comment was about using these classes in a testing context. For the general intented usage, I agree that the default configuration should work without any change from the developer. The problems of the default / locally installed config are only when writing unit tests for those components.

I completely agree with the rest of your comment.

Closing now since this has been superseded:

  1. T1410 tries to summarize our current use in the goal to design a better configuration system.
  2. SWHMixin is currently being removed through T1532
  3. pytest fixtures helps a lot in bootstraping configuration in testing now. It's now mostly used everywhere.
  4. What's described in the description in details is no longer true though the gist of it is (but like i said, see 1.).
ardumont claimed this task.