Page MenuHomeSoftware Heritage

config: Deprecate SWHConfig in favor of load_from_envvar function
ClosedPublic

Authored by ardumont on Sep 16 2020, 3:22 PM.

Details

Summary

That new function declares exactly what happens today during our configuration
loading (for loader/lister/indexer/... and rpc services in general not for the
CLIs which will be dealt with later).

Loads and parses the yaml configuration file out of the SWH_CONFIG_FILENAME
environment variable. Allowing eventual dict enrichment with default
configuration if provided.

Everything docker/staging/production related is setting the SWH_CONFIG_FILENAME in the environment [1-5]
(including the tests now as well).

This currently impacts (which I will deal with soon):

$SWH_ENVIRONMENT_HOME/swh-indexer/swh/indexer/indexer.py:from swh.core.config import SWHConfig
$SWH_ENVIRONMENT_HOME/swh-indexer/swh/indexer/indexer.py:class BaseIndexer(SWHConfig, metaclass=abc.ABCMeta):
$SWH_ENVIRONMENT_HOME/swh-indexer/swh/indexer/rehash.py:from swh.core.config import SWHConfig
$SWH_ENVIRONMENT_HOME/swh-indexer/swh/indexer/rehash.py:class RecomputeChecksums(SWHConfig):
$SWH_ENVIRONMENT_HOME/swh-lister/swh/lister/core/lister_base.py:class ListerBase(abc.ABC, config.SWHConfig):
$SWH_ENVIRONMENT_HOME/swh-loader-core/swh/loader/core/loader.py:class BaseLoader(config.SWHConfig, metaclass=ABCMeta):
$SWH_ENVIRONMENT_HOME/swh-loader-core/swh/loader/package/loader.py:from swh.core.config import SWHConfig
$SWH_ENVIRONMENT_HOME/swh-loader-core/swh/loader/package/loader.py:        self.config = SWHConfig.parse_config_file()

[1] prod/staging: https://forge.softwareheritage.org/source/puppet-swh-site/browse/production/?grep=SWH_CONFIG_FILENAME
[2] docker: https://forge.softwareheritage.org/source/swh-environment/browse/master/?grep=SWH_CONFIG_FILENAME

As sample:
[3] loader/lister: https://forge.softwareheritage.org/source/puppet-swh-site/browse/production/site-modules/profile/templates/swh/deploy/worker/swh-worker@.service.erb$10
[4] scheduler: https://forge.softwareheritage.org/source/swh-environment/browse/master/docker/docker-compose.yml$109
[5] api/rpc servers: https://forge.softwareheritage.org/source/puppet-swh-site/browse/production/site-modules/profile/manifests/swh/deploy/rpc_server.pp$104

Related to T1532
Related to T1410

Test Plan

tox

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
swh/core/config.py
303–304
WARNING: mutable value in class definition. As with function definitions, this will only be eval'd once. I think this class is meant to be instanciated, we don't want to share config between instances. In that case, use factory like attrs ones or write a constructor. For this, we better think of a strategy that does not involves user discipline but works automatically.
305

A default config, albeit empty, is defined (as dict), and merge_config has no input domain restriction. So this conditional is not needed.

swh/core/config.py
303–304

alternatively, we can just make it None

Build is green

Patch application report for D3965 (id=13983)

Could not rebase; Attempt merge onto 66ac6ed3aa...

Updating 66ac6ed..cf4f7e5
Fast-forward
 swh/core/config.py            | 92 +++++++++++++++----------------------------
 swh/core/tests/test_config.py | 41 +++++++++++++++++++
 2 files changed, 73 insertions(+), 60 deletions(-)
Changes applied before test
commit cf4f7e5d942d83c05f505012a6695e94f796ee1e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 16 15:25:52 2020 +0200

    swh.core.config: Simplify SWHConfig mixin to its actual use
    
    Related to T1532

commit 22e77212f1f7704a9d138da12b75304269b1884c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 16 15:26:25 2020 +0200

    swh.core.config: Add types

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/54/ for more details.

swh/core/config.py
303–304

No, often a bad idea to have nullable types where it is not necessary, for reliability reasons. In this mixin, only a declaration, which conveniently serves as a spec btw, and in user class only an initialization is needed. In fact it could be seen as a lazy constant, which is very safe (reliable) construct.

Build is green

Patch application report for D3965 (id=14025)

Rebasing onto 28d61c8cf1...

Current branch diff-target is up to date.
Changes applied before test
commit 4658518084d4bbe769afcdcf87290f03319a5fe9
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 16 15:25:52 2020 +0200

    swh.core.config: Simplify SWHConfig mixin to its actual use
    
    Related to T1532

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/62/ for more details.

I think you should not remove this code as it has one use case (filename is provided as parameter, from CLI or not) but have a different function name for the "prod" use case (e.g. parse_config_file_envvar).
Some parts in the old code are not needed anymore, but not all. In another diff we will cover the other use case.
For now better to have 2 functions.

swh/core/config.py
305

I repeat, this branch will always match, so no need to keep it.

I think you should not remove this code as it has one use case (filename is provided as parameter, from CLI or not)
but have a different function name for the "prod" use case (e.g. parse_config_file_envvar).

I am not sure I completely follow. It's not used right now, better remove it so maintenance is clearer.

Some parts in the old code are not needed anymore, but not all.

Currently, it is needed by rpc services, not the CLIs.
Those are using functions from swh.core.config but not the mixin iirc.

In another diff we will cover the other use case.

Yes, in another diff, when the specification you are working on is clear on what it is we must do.
In the mean time, better align the code with production use.

Otherwise, that's still code we have to follow through even though it's not used...

For now better to have 2 functions.

Can you please develop that part?

Thanks for the feedback.
Cheers,

swh/core/config.py
303–304

WARNING
...

I'm opened to an example of what's implied by that suggestion.

Please, for example, wrap the snippet in tiple backquote

<sample-code>

so it's nicely rendered here ;)

305

yes, I did not adapt it yet since i'm unclear as to what i need to do for the DEFAULT_CONFIG field remark first...

  • Drop redundant conditional
  • Rebase on latest master
ardumont marked 3 inline comments as done.

Build is green

Patch application report for D3965 (id=14112)

Rebasing onto 56d505f34a...

Current branch diff-target is up to date.
Changes applied before test
commit e48d8b59633205b5af760a0169450e22fd654cd1
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 16 15:25:52 2020 +0200

    swh.core.config: Simplify SWHConfig mixin to its actual use
    
    Related to T1532

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/72/ for more details.

This revision is now accepted and ready to land.Sep 22 2020, 10:05 AM

After discussing with @tenma, I'll simplify this to a function and finally get rid of that mixin.

Drop SWHMixin in favor of new function load_from_envvar

This revision is now accepted and ready to land.Sep 22 2020, 11:13 AM
ardumont retitled this revision from swh.core.config: Simplify SWHConfig mixin to its actual use to config: Drop SWHConfig mixin in favor of the load_from_envvar function.Sep 22 2020, 11:14 AM
ardumont edited the summary of this revision. (Show Details)

Build is green

Patch application report for D3965 (id=14117)

Rebasing onto 56d505f34a...

Current branch diff-target is up to date.
Changes applied before test
commit c4ee5db89f0f1490813741937f7fe3dd1e35be30
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 16 15:25:52 2020 +0200

    config: Drop SWHConfig mixin in favor of the load_from_envvar function
    
    That new function declares exactly what happens today during our configuration
    loading use. Loads and parses the yaml configuration file out of the
    SWH_CONFIG_FILENAME environment variable. Allowing eventual dict enrichment
    with default configuration if provided.
    
    Related to T1532

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/73/ for more details.

ardumont marked 3 inline comments as done.
swh/core/config.py
301

assert "SWH_CONFIG_FILENAME" in os.environ, "SWH_CONFIG_FILENAME environment variable is undefined." est encore plus clair, ça dit "vous avez pas suivi le contrat"

ardumont edited the summary of this revision. (Show Details)

Avoid empty dict and use None instead

Build is green

Patch application report for D3965 (id=14118)

Rebasing onto 56d505f34a...

Current branch diff-target is up to date.
Changes applied before test
commit d9d4ef0f742ac40e8e01a8cfe89605a87381a264
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 16 15:25:52 2020 +0200

    config: Drop SWHConfig mixin in favor of the load_from_envvar function
    
    That new function declares exactly what happens today during our configuration
    loading use. Loads and parses the yaml configuration file out of the
    SWH_CONFIG_FILENAME environment variable. Allowing eventual dict enrichment
    with default configuration if provided.
    
    Related to T1532

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/74/ for more details.

Build is green

Patch application report for D3965 (id=14119)

Rebasing onto 56d505f34a...

Current branch diff-target is up to date.
Changes applied before test
commit 1e90d17646d1067e1ada078fa1ea8fdfe7e8824e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 16 15:25:52 2020 +0200

    config: Drop SWHConfig mixin in favor of the load_from_envvar function
    
    That new function declares exactly what happens today during our configuration
    loading use. Loads and parses the yaml configuration file out of the
    SWH_CONFIG_FILENAME environment variable. Allowing eventual dict enrichment
    with default configuration if provided.
    
    Related to T1532

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/75/ for more details.

Deprecate SWHConfig mixin in favor of the load_from_envvar function

This will allow to be incremental in the change without failing any builds.
This removal will happen once every impacted clients of this change are
handled.

ardumont retitled this revision from config: Drop SWHConfig mixin in favor of the load_from_envvar function to config: Deprecate SWHConfig mixin in favor of the load_from_envvar function.Sep 22 2020, 1:16 PM

Build is green

Patch application report for D3965 (id=14127)

Rebasing onto 56d505f34a...

Current branch diff-target is up to date.
Changes applied before test
commit 99cd309c1e616849a3010db80cd017294c94c062
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 16 15:25:52 2020 +0200

    config: Deprecated SWHConfig in favor of load_from_envvar function
    
    That new function declares exactly what happens today during our configuration
    loading (for loader/lister/indexer/... and rpc services in general not for the
    CLIs which will be dealt with later).
    
    Loads and parses the yaml configuration file out of the SWH_CONFIG_FILENAME
    environment variable. Allowing eventual dict enrichment with default
    configuration if provided.
    
    Related to T1532

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/76/ for more details.

Fix one-char indent off (by mistake)

Build is green

Patch application report for D3965 (id=14129)

Rebasing onto 56d505f34a...

Current branch diff-target is up to date.
Changes applied before test
commit 0e03218c4c3b5562223b3f310825b5a367ef8a0d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 16 15:25:52 2020 +0200

    config: Deprecated SWHConfig in favor of load_from_envvar function
    
    That new function declares exactly what happens today during our configuration
    loading (for loader/lister/indexer/... and rpc services in general not for the
    CLIs which will be dealt with later).
    
    Loads and parses the yaml configuration file out of the SWH_CONFIG_FILENAME
    environment variable. Allowing eventual dict enrichment with default
    configuration if provided.
    
    Related to T1532

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/77/ for more details.

swh/core/config.py
303–304

to be complete, here is @tenma's suggestion:

@attr.s
class Config:
     conf = attr.ib(factory=dict)
This revision is now accepted and ready to land.Oct 2 2020, 9:41 AM

Build is green

Patch application report for D3965 (id=14525)

Rebasing onto d230cb3df3...

Current branch diff-target is up to date.
Changes applied before test
commit 56cd03554a757192372bbec1c4d179c4f1030b25
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 16 15:25:52 2020 +0200

    config: Deprecated SWHConfig in favor of load_from_envvar function
    
    That new function declares exactly what happens today during our configuration
    loading (for loader/lister/indexer/... and rpc services in general not for the
    CLIs which will be dealt with later).
    
    Loads and parses the yaml configuration file out of the SWH_CONFIG_FILENAME
    environment variable. Allowing eventual dict enrichment with default
    configuration if provided.
    
    Related to T1532

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/88/ for more details.

I think I see a way forward for loaders (which is an annoyingly and hard-to-follow inheritance scaffolding... especially for the dvcs loaders, not so much for package loaders)
Use in their constructor the load_from_envvar call to boostrap the config (with DEFAULT_CONFIG explicitely defined in the module).
In the dvcs loaders subclass (svn, git, mercurial), plainly call the super constructor and then merge explicitely the self.config with the DEFAULT_CONFIG they will define in their module as well.

And there you go:

  • drop SWHConfig
  • centralized call nonetheless to the same function which does the loading from the env variable
  • more explicit configuration

I'll add diffs as proposal D4124, D4125 are the first steps (which are failing right now because they need this diff).

ardumont retitled this revision from config: Deprecate SWHConfig mixin in favor of the load_from_envvar function to config: Deprecate SWHConfig in favor of load_from_envvar function.Oct 2 2020, 10:48 AM

I think I see a way forward for loaders

diffs opened, to summarize, here goes:

diffmodulenote
D4125package.loader: Migrate away from SWHConfig mixinleaf, Depends on this diff D3965
D4124core.loader: Migrate away from SWHConfig mixinroot diff for dvcs loaders below, Depends on this diff D3965
D4128git.loader: Migrate away from SWHConfig mixinleaf, depends on D4124
D4127mercurial.loader: Migrate away from SWHConfig mixinleaf, depends on D4124
D4126svn.loader: Migrate away from SWHConfig mixinleaf, depends on D4124

Update the deprecated version to mention the correct release 0.3.2

Build is green

Patch application report for D3965 (id=14548)

Rebasing onto d230cb3df3...

Current branch diff-target is up to date.
Changes applied before test
commit 82a47667b2b8c6d07674ba97742bb634f3acaf0b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Sep 16 15:25:52 2020 +0200

    config: Deprecated SWHConfig in favor of load_from_envvar function
    
    That new function declares exactly what happens today during our configuration
    loading (for loader/lister/indexer/... and rpc services in general not for the
    CLIs which will be dealt with later).
    
    Loads and parses the yaml configuration file out of the SWH_CONFIG_FILENAME
    environment variable. Allowing eventual dict enrichment with default
    configuration if provided.
    
    Related to T1532

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/89/ for more details.