⚓ T1410 Kill implicit configuration: new configuration scheme

Status	Assigned	Task
Migrated	gitlab-migration	T1410 Kill implicit configuration: new configuration scheme
Migrated	gitlab-migration	T1386 Refactor indexers' initialization step
Migrated	gitlab-migration	T826 Objects that implicitely load configuration are a nightmare to test
Migrated	gitlab-migration	T1532 Cleanup deprecated configuration code in swh modules
Migrated	gitlab-migration	T1531 Cleanup manifests once deployment is done
Migrated	gitlab-migration	T1525 Deploy all swh modules' latest version

tenma renamed this task from Kill implicit configuration to Kill implicit configuration : new configuration scheme.Sep 17 2020, 10:50 AM

tenma claimed this task.

tenma updated the task description. (Show Details)

tenma removed a project: Sprint 2018 12.

tenma added subscribers: ardumont, anlambert.

olasd added a project: Core & foundations.Sep 17 2020, 10:53 AM

ardumont renamed this task from Kill implicit configuration : new configuration scheme to Kill implicit configuration: new configuration scheme.Sep 17 2020, 4:13 PM

tenma renamed this task from Kill implicit configuration: new configuration scheme to Kill implicit configuration : new configuration scheme.Sep 17 2020, 5:20 PM

tenma updated the task description. (Show Details)

tenma updated the task description. (Show Details)Sep 17 2020, 6:12 PM

For the configuration part, I see 2 use cases:

rpc servers, loaders, listers, ... (swh services really): They need a simple configuration file to parse and load, with eventual checks on missing keys. In such a case, it should fail early with a clear message about what's missing (so we can fix fast and restart the service). [1]

cli: The need is more complex (what i gathered from your irc discussion, which I did not follow entirely). What i understood was a need to share some common part across multiple subcommands of the swh cli (typically the auth-token part). And that a merge policy was thus in order. The fail-fast property is also good here ;)

[1]
Implementation wise, that's were SWHConfig kinda came from. And afaik, that's
not used by the swh cli. Thus, why the D3965 proposal which dropped a lot of
the unused code in the first place sounded sensible to me. There should be no
impact on those services with that diff (well, aside the fact that I need to
update their respective configuration with all that they need so no more
implicit configuration ;)

tenma updated the task description. (Show Details)Sep 21 2020, 2:20 PM

ardumont added a revision: D3965: config: Deprecate SWHConfig in favor of load_from_envvar function.Sep 22 2020, 11:35 AM

ardumont added a revision: D4007: loader*: Migrate to swh.core.config.load_from_envvar.Sep 22 2020, 1:51 PM

rpc servers, loaders, listers, ... (swh services really): They need a simple
configuration file to parse and load, with eventual checks on missing keys.
In such a case, it should fail early with a clear message about what's
missing (so we can fix fast and restart the service). [1]

I completely missed and put aside the current complexities we have in our loader declarations...

for example, for the original dvcs loaders, we have the following inheritance
chain:

(SWHConfig <-) BaseLoader <- DVCSLoader <- {GitLoader,GitLoaderFromDisk,GitLoaderFromArchive,HgBundle20Loader,...}

and we also have:

(SWHConfig <-) BaseLoader <- SvnLoader <- {SvnLoaderFromDumpArchive,SvnLoaderFromRemoteDump}

reading: A <- {B,C, } : "B, C, ... inherit from A"

What is complex here is that each layer adds its own subset of default configuration to enrich if not provided.
So maybe, in the end, the need is shared across CLIs and the rest.

ardumont added a revision: D4124: core.loader: Migrate away from SWHConfig mixin.Oct 2 2020, 10:00 AM

ardumont added a revision: D4125: package.loader: Migrate away from SWHConfig mixin.

ardumont added a revision: D4126: svn.loader: Migrate away from SWHConfig mixin.Oct 2 2020, 10:24 AM

ardumont added a revision: D4127: mercurial.loader: Migrate away from SWHConfig mixin.Oct 2 2020, 10:32 AM

ardumont added a revision: D4128: git.loader: Migrate away from SWHConfig mixin.Oct 2 2020, 10:42 AM

ardumont renamed this task from Kill implicit configuration : new configuration scheme to Kill implicit configuration: new configuration scheme.Oct 2 2020, 10:54 AM

Should have said as much here instead...
Nonethess, a way forward for simplifying loaders is described [1]
Implementation proposal is summarized [2] (with implementation diffs reference in summary).

[1] D3965#102053

[2] D3965#102120

ardumont added a commit: rDCORE82a47667b2b8: config: Deprecated SWHConfig in favor of load_from_envvar function.Oct 2 2020, 11:41 AM

ardumont added a commit: rDLDBASEb81bcf4b4c40: core.loader: Migrate away from SWHConfig mixin.Oct 2 2020, 1:17 PM

ardumont added a commit: rDLDBASE5e4fd64d1061: package.loader: Migrate away from SWHConfig mixin.

ardumont added a commit: rDLDSVN87d4a7e22f63: svn.loader: Migrate away from SWHConfig mixin.Oct 2 2020, 2:15 PM

ardumont added a commit: rDLDHG00e010a15ae7: mercurial.loader: Migrate away from SWHConfig mixin.

ardumont added a commit: rDLDGca7556c35604: git.loader: Migrate away from SWHConfig mixin.Oct 2 2020, 2:17 PM

tenma updated the task description. (Show Details)Oct 2 2020, 3:35 PM

tenma updated the task description. (Show Details)Oct 2 2020, 3:48 PM

Suggestions from irc discussion summary:

fallback to default config when the current necessary SWH_CONFIG_FILENAME is not set (instead of current failure)

a service without a configuration should be able to run by itself from a REPL (default-config should target swh services that runs in memory)

Move the configuration file loading out of the services themselves (loader, lister, indexer, ...) and moves within the scope of the entrypoints instead (swh cli, celery task, gunicorn wsgi, etc...)

Maybe starting a pad/hackmd document would be easier at this point?

tenma updated the task description. (Show Details)Oct 2 2020, 4:18 PM

In T1410#50054, @ardumont wrote:

Suggestions from irc discussion summary:

Move the configuration file loading out of the services themselves (loader, lister, indexer, ...) and moves within the scope of the entrypoints instead (swh cli, celery task, gunicorn wsgi, etc...)

In my mind, this means giving most of our "implementation" classes (e.g. FooLoader, BarLister, etc.) either an explicit "config" parameter or, even better, a set of properly named and typed parameters matching the contents of said configuration.

We should make the entry points parse the configuration file, and populate the arguments to the classes. This also allows overriding some configuration parameters with command line arguments, which we're kinda doing but inconsistently in some CLIs.

This is also consistent with how we're currently using the get_storage, get_objstorage, ... functions: we pass them cls and args directly taken from the contents of the configuration file of the component that's currently being run.

Moving this way would drastically reduce the "cognitive overhead" of test fixtures for all our components, which currently have to : create a tempdir, write yaml to a tempfile, override an environment variable. This also makes it explicit which bit of the config affects which components.

fallback to default config when the current necessary SWH_CONFIG_FILENAME is not set (instead of current failure)

a service without a configuration should be able to run by itself from a REPL (default-config should target swh services that runs in memory)

I'm not sure we really need these two to happen if the "configuration" of our components ends up being a set of explicit arguments to their initialization function.

ardumont mentioned this in D4133: git.loader*: Open configuration passing from constructor.Oct 2 2020, 5:05 PM

fallback to default config when the current necessary SWH_CONFIG_FILENAME is not set (instead of current failure)
a service without a configuration should be able to run by itself from a REPL (default-config should target swh services that runs in memory)

I'm not sure we really need these two to happen if the "configuration" of our components ends up being a set of explicit arguments to their initialization function.

Yes, now that i understand the full extent of what you foresaw, I tend to agree.

I don't yet know if we still need those default config or not with the previous suggestion.

Still, another suggestion which I found reasonable is to add back the required swh services in the default config (they are no longer present).
But those should default to the in-memory implementation. Serving mainly as documentation purposes.

Something like:

DEFAULT_CONFIG: Dict[str, Any] = {
    "max_content_size": 100 * 1024 * 1024,
    "save_data": False,
    "save_data_path": "",
    "storage": {"cls": "memory"},  # before, when we had no in-memory implementation, we had localhost:5002
}

ardumont added a revision: D4140: indexer*: Migrate away from SWHConfig mixin.Oct 5 2020, 11:54 AM

ardumont added a revision: D4141: lister*: Migrate away from SWHConfig mixin.Oct 5 2020, 12:53 PM

ardumont mentioned this in T826: Objects that implicitely load configuration are a nightmare to test.Oct 5 2020, 1:39 PM

ardumont closed subtask T826: Objects that implicitely load configuration are a nightmare to test as Resolved.

ardumont added a commit: rDCIDX1eb521c98c43: indexer*: Migrate away from SWHConfig mixin.Oct 5 2020, 7:15 PM

ardumont added a commit: rDLS56f08b73f6b0: lister*: Migrate away from SWHConfig mixin.

ardumont closed subtask T1532: Cleanup deprecated configuration code in swh modules as Resolved.Oct 7 2020, 2:04 PM

haltode mentioned this in T2678: Have a default location for the configuration file.Oct 9 2020, 2:14 PM

ardumont mentioned this in D4245: deposit.cli.admin: Add types and coverage on module.Oct 14 2020, 11:48 AM

ardumont added a revision: D4272: swh.indexer.storage: Unify get_indexer_storage function with others.Oct 15 2020, 3:46 PM

ardumont added a commit: rDCIDX07c96743d726: swh.indexer.storage: Unify get_indexer_storage function with others.Oct 15 2020, 6:15 PM

tenma updated the task description. (Show Details)Oct 16 2020, 10:12 AM

tenma updated the task description. (Show Details)

ardumont added a revision: D4284: scheduler: Type and unify get_scheduler factory with other factories.Oct 16 2020, 1:13 PM

ardumont added a revision: D4294: swh.vault: Unify get_vault factory function with other factories.Oct 16 2020, 7:33 PM

ardumont added a commit: rDSCH13dcaddbed81: scheduler: Type and unify get_scheduler factory with other factories.Oct 19 2020, 9:21 AM

haltode mentioned this in T2710: swh-fuse: fails with "'TypeError: Cannot merge a <class 'dict'> with a <class 'NoneType'>" when conffile is empty or commented out.Oct 19 2020, 9:46 AM

ardumont added a commit: rDVAU8e4026a5e8c9: swh.vault: Unify get_vault factory function with other factories.Oct 19 2020, 1:58 PM

tenma added a revision: D4731: WIP Configuration system.Dec 11 2020, 7:22 PM

tenma updated the task description. (Show Details)Dec 16 2020, 10:48 AM

ardumont added a revision: D5071: Unify loader instantiation.Feb 12 2021, 5:51 PM

ardumont added a revision: D5076: loader: Expect visit_date as an optional date in constructors.Feb 15 2021, 6:00 PM

ardumont added a revision: D5075: Rework loader instantiation logic according to loader core api.Feb 15 2021, 6:01 PM

ardumont added a revision: D5077: Rework loader instantiation logic according to loader core api.Feb 15 2021, 6:02 PM

ardumont added a revision: D5078: Rework loader instantiation logic according to loader core api.Feb 15 2021, 6:12 PM

ardumont added a commit: rDLDBASE7116bb75897a: Unify loader instantiation.Feb 17 2021, 12:03 PM

ardumont added a revision: D5092: tests: Fix loader-git instantiation.Feb 17 2021, 1:29 PM

ardumont added a revision: D5093: tests: Fix loader-git instantiation.Feb 17 2021, 1:49 PM

ardumont added a commit: rDLDGb14c06e70846: Rework loader instantiation logic according to loader core api.Feb 17 2021, 3:06 PM

ardumont added a commit: rDVAU2de8869c291f: tests: Fix loader-git instantiation.Feb 17 2021, 3:14 PM

ardumont added a commit: rDWAPPS2e0ccea86a61: tests: Fix loader-git instantiation.Feb 17 2021, 3:19 PM

ardumont added a commit: rDLDHGd8f28b70fa77: Rework loader instantiation logic according to loader core api.Feb 17 2021, 4:08 PM

ardumont added a commit: rDLDSVN2c54129bcf48: Rework loader instantiation logic according to loader core api.Feb 17 2021, 6:19 PM

ardumont added a commit: rDLDSVNeead508f0afe: loader: Expect visit_date as an optional date in constructors.

ardumont added a revision: D5102: Separate loader-deposit from other loaders.Feb 17 2021, 6:40 PM

ardumont mentioned this in rSPSITE8af0c2850c19: git: Drop no longer supported keys.Feb 18 2021, 10:11 AM

ardumont added a commit: rDENV467abd4d11b4: Separate loader-deposit from other loaders.Feb 18 2021, 11:13 AM

vsellier removed tenma as the assignee of this task.Apr 16 2021, 11:48 AM

vsellier added a subscriber: tenma.

vsellier mentioned this in T3254: Outboarding of tenma.Apr 16 2021, 11:51 AM

gitlab-migration changed the status of subtask T826: Objects that implicitely load configuration are a nightmare to test from Resolved to Migrated.Jan 8 2023, 4:23 PM

gitlab-migration changed the status of subtask T1532: Cleanup deprecated configuration code in swh modules from Resolved to Migrated.Jan 8 2023, 4:26 PM

This task has been migrated to GitLab.

gitlab-migration changed the status of subtask T1386: Refactor indexers' initialization step from Wontfix to Migrated.Jan 8 2023, 9:58 PM

rDLDG Git loader
		D5077	rDLDGb14c06e70846 Rework loader instantiation logic according to loader core api
		D4128	rDLDGca7556c35604 git.loader: Migrate away from SWHConfig mixin
rDSCH Scheduling utilities
		D4284	rDSCH13dcaddbed81 scheduler: Type and unify get_scheduler factory with other factories
rDWAPPS Web applications
		D5093	rDWAPPS2e0ccea86a61 tests: Fix loader-git instantiation
rDLDHG Mercurial loader
		D5078	rDLDHGd8f28b70fa77 Rework loader instantiation logic according to loader core api
		D4127	rDLDHG00e010a15ae7 mercurial.loader: Migrate away from SWHConfig mixin
rDENV Development environment
		D5102	rDENV467abd4d11b4 Separate loader-deposit from other loaders
rDLDBASE Generic VCS/Package Loader
	Abandoned		D4007 loader*: Migrate to swh.core.config.load_from_envvar
		D5071	rDLDBASE7116bb75897a Unify loader instantiation
		D4125	rDLDBASE5e4fd64d1061 package.loader: Migrate away from SWHConfig mixin
		D4124	rDLDBASEb81bcf4b4c40 core.loader: Migrate away from SWHConfig mixin
rDCIDX Metadata indexer
		D4272	rDCIDX07c96743d726 swh.indexer.storage: Unify get_indexer_storage function with others
		D4140	rDCIDX1eb521c98c43 indexer*: Migrate away from SWHConfig mixin
rDLS Listers
		D4141	rDLS56f08b73f6b0 lister*: Migrate away from SWHConfig mixin
rDVAU Software Heritage Vault
		D5092	rDVAU2de8869c291f tests: Fix loader-git instantiation
		D4294	rDVAU8e4026a5e8c9 swh.vault: Unify get_vault factory function with other factories
rDCORE Foundations and core functionalities
	Needs Review		D4731 WIP Configuration system
		D3965	rDCORE82a47667b2b8 config: Deprecated SWHConfig in favor of load_from_envvar function
rDLDSVN Subversion (SVN) loader
		D5076	rDLDSVNeead508f0afe loader: Expect visit_date as an optional date in constructors
		D5075	rDLDSVN2c54129bcf48 Rework loader instantiation logic according to loader core api
		D4126	rDLDSVN87d4a7e22f63 svn.loader: Migrate away from SWHConfig mixin

Kill implicit configuration: new configuration scheme
Closed, MigratedEdits Locked
Actions

Description

Revisions and Commits

Related Objects
Search...

Event Timeline

	douardda
	Dec 4 2018, 10:27 AM

Kill implicit configuration: new configuration schemeClosed, MigratedEdits LockedActions

Description

Revisions and Commits

Related ObjectsSearch...

Event Timeline

Kill implicit configuration: new configuration scheme
Closed, MigratedEdits Locked
Actions

Related Objects
Search...