Page MenuHomeSoftware Heritage

WIP Configuration system
Needs ReviewPublic

Authored by tenma on Dec 11 2020, 7:22 PM.

Details

Summary

Related to T1410

Test Plan

tox -e sphinx-dev

Event Timeline

Build is green

Patch application report for D4731 (id=16756)

Rebasing onto 777ea186fa...

First, rewinding head to replay your work on top of it...
Applying: WIP Configuration system
Changes applied before test
commit 2cbf5764b5f4595da8e94df74c3662d572d1b3b8
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Dec 11 19:19:01 2020 +0100

    WIP Configuration system

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/183/ for more details.

Cleaned and wrote up several sections

Build is green

Patch application report for D4731 (id=16804)

Rebasing onto 777ea186fa...

First, rewinding head to replay your work on top of it...
Applying: WIP Configuration system
Changes applied before test
commit 3f8808a0a84f8b77883f98e3d9a74098dbb48718
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Dec 11 19:19:01 2020 +0100

    WIP Configuration system

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/184/ for more details.

docs/config.md
5

Can you please use some kind of formatter which cut the line at 88 chars.
It's most pleasant to the eyes.

See D4746 for the deposit documentation which does that for example

Added a TODO about a section on guidelines

Build is green

Patch application report for D4731 (id=16806)

Rebasing onto 777ea186fa...

First, rewinding head to replay your work on top of it...
Applying: WIP Configuration system
Changes applied before test
commit e3dff437b3ba239c8e37f8a4db082f078969d011
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Dec 11 19:19:01 2020 +0100

    WIP Configuration system

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/185/ for more details.

Reflow the whole source to 88 chars lines

Build is green

Patch application report for D4731 (id=16808)

Rebasing onto 777ea186fa...

First, rewinding head to replay your work on top of it...
Applying: WIP Configuration system
Changes applied before test
commit 21b471d17e17a60d56119daa1b7752f8811620b1
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Dec 11 19:19:01 2020 +0100

    WIP Configuration system

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/186/ for more details.

Reflow the whole source to 88 chars lines

Thanks ;)

I'm confused about a few things:

Until now, configuration handling was implemented independently for each service.

What do you mean by that? It has always been handled by SWHConfig in swh.core.

Based on YAML:

  • only YAML primitive types (includes mappings and lists), like JSON
  • restriction on document structure (see grammar below)
  • addition of a reference system where we control resolution to avoid duplicated definitions

This (and the grammar) sounds like it disallows YAML references, but I see one in an example (&celery). So are they allowed?

Either way, why add our own type of references instead of relying on a feature YAML already provides?

Singletons are ad-hoc objects that are defined like instances but with no associated type.

What does this mean? Can't we do that with instances?

  1. Could you list all the changes between this and what we are currently doing, either at the beginning or the end? I see two (instances + merging all conf files in a single one) but I may be missing some.

First, thanks for the feedback, every help is welcome because I am still pretty the newbie in the project as a whole, and this is a more complex topic that initially anticipated.

Until now, configuration handling was implemented independently for each service.

What do you mean by that? It has always been handled by SWHConfig in swh.core.

Well I did not search the history before sept 2020, but at the time even if most components (but several did not) used SWHConfig until we moved away from it recently, there still is significant complementary configuration handling (notably in CLIs and validating routines) code duplicated in components across the codebase. This scheme intends to have much more coverage on every aspect of configuration.

I will update this sentence, what do you think would be the key point to include?

This (and the grammar) sounds like it disallows YAML references, but I see one in an example (&celery). So are they allowed?

Either way, why add our own type of references instead of relying on a feature YAML already provides?

Yes, this point needs update, and you may have a solution on this issue:

  • we seemingly cannot control the resolution of YAML references, so a YAML load would hand us the whole definition with the referenced elements already duplicated in the intended places. But for components, we wanted to instantiate every instance only one time and pass it to the other instances that needed them; to do so we need to handle resolution ourselves. With no more traces of a reference when we process the configuration, we would get a different instance (Python object) at each former reference even if it pointed to the same definition, so different identity semantics.
  • until the last meeting, we saw no use of keeping the YAML reference, but it makes sense to refer to anything other than component instances, be it singleton instances or mere attributes, where duplication does not matter to us. I forgot to update everything to reflect the fact the we keep it.

I admit it is not exactly simple, and it would be very relevant to ask: "do we really need that?". Because we did not took much time to discuss it, we just agreed we would have to do this in the flow of the conversation.
So it boils down to the following question I think: "are there situations where we have multiple instances that depend on a shared instance?" where instance may be of any SWH component such as a client, a server, etc. which answer is not clear to me.

To sum up on this one, there is no question on keeping YAML references (so I will update), but there is an open one on adding our own reference system.

Singletons are ad-hoc objects that are defined like instances but with no associated type.

What does this mean? Can't we do that with instances?

There is no syntactic difference but a semantic one. Singletons (*needs better name*) are verbatim in a way, they will not be instantiated into a "business" object (component) but as an "untyped" dict.
I'm having issues defining it more clearly.

  1. Could you list all the changes between this and what we are currently doing, either at the beginning or the end? I see two (instances + merging all conf files in a single one) but I may be missing some.

"merging all conf files" is now possible with instances (= configuration alternatives), but there will still be one tailored for each service in production. But if running all in the same host, you can have the whole config in one file. This one is a minor feature.

I would say: instance definition, references (flexibility and DRY), a loading API for each use case, provide framework for instantiation/injection/validation (thus uniformity and some level of inversion of control), standard environment handling.

Will update.

But for components, we wanted to instantiate every instance only one time

What is the benefit of doing that (vs. the cost of the extra complexity)?

There is no syntactic difference but a semantic one. Singletons (*needs better name*) are verbatim in a way, they will not be instantiated into a "business" object (component) but as an "untyped" dict.
I'm having issues defining it more clearly.

What about "dict literals"? or "literal dicts"?

But for components, we wanted to instantiate every instance only one time

What is the benefit of doing that (vs. the cost of the extra complexity)?

It is not a matter of benefit but necessity (or not). As I said, I think we need it if the answer is yes to the following: "are there situations where we have multiple instances that depend on a shared instance [state]?", IOW wherever we depend on a shared mutable state of a component which will be instantiated by the config framework (with create_component or whatever the name). Question that I can't answer with my limited experience of the codebase.

There is no syntactic difference but a semantic one. Singletons (*needs better name*) are verbatim in a way, they will not be instantiated into a "business" object (component) but as an "untyped" dict.

What about "dict literals"? or "literal dicts"?

Thought about that, rather as "literal objects" than "literal dicts" which IMO captures the meaning better. Will see what the others think. It is one of of a set of pending terminology questions raised originally in the pad.

Take into account feedback from ardumont and vlorentz

  • do not overly abbreviate some terms notably identifiers
  • update remark about existing config handling
  • be more explicit on the relation to YAML, notably regarding references
  • rephrase presentation of singletons
  • be more explicit on what the new system provides (updates the

rationale which aims to present that)

Build is green

Patch application report for D4731 (id=16917)

Rebasing onto 777ea186fa...

First, rewinding head to replay your work on top of it...
Applying: WIP Configuration system
Applying: Take into account feedback from ardumont and vlorentz
Changes applied before test
commit c4740136fa5bfcede8c35c7ccd80e4c057a4499b
Author: tenma <tenma+swh@mailbox.org>
Date:   Tue Dec 22 13:10:58 2020 +0100

    Take into account feedback from ardumont and vlorentz
    
    To be squashed.
    
    - do not overly abbreviate some terms notably identifiers
    - update remark about existing config handling
    - be more explicit on the relation to YAML, notably regarding references
    - rephrase presentation of singletons
    - be more explicit on what the new system provides (updates the
    rationale which aims to present that)

commit 2d54939ce2f3424074a25799a520aa6713ac2044
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Dec 11 19:19:01 2020 +0100

    WIP Configuration system

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/188/ for more details.

docs/config.md
176–180

you should expand these acronyms now

Be more consistent in naming and abbreviating

Build is green

Patch application report for D4731 (id=16918)

Rebasing onto 777ea186fa...

First, rewinding head to replay your work on top of it...
Applying: WIP Configuration system
Applying: Take into account feedback from ardumont and vlorentz
Changes applied before test
commit 7c14516b9a6662bda78a82996bbff9d02e9d367a
Author: tenma <tenma+swh@mailbox.org>
Date:   Tue Dec 22 13:10:58 2020 +0100

    Take into account feedback from ardumont and vlorentz
    
    To be squashed.
    
    - do not overly abbreviate some terms and be more consistent
    - update remark about existing config handling
    - be more explicit on the relation to YAML, notably regarding references
    - rephrase presentation of singletons
    - be more explicit on what the new system provides (updates the
    rationale which aims to present that)

commit e944b06111ef5e5dafe7028f978f1ffc5d4cf108
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Dec 11 19:19:01 2020 +0100

    WIP Configuration system

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/189/ for more details.

tenma marked an inline comment as done.
douardda requested changes to this revision.Jan 5 2021, 2:42 PM

not finished, but first burst of comments

docs/config.md
42

what's IO stream in this context? is this something existing? needed? or just "possible"?

144

I's add "custom" (add a custom reference system) to make it crystal clear

146

I'd write it in pseudo YAML:

type:
  instance:
    attribute: value

without the "Attribute is a [...]" below

151

1 instance is associated to N attributes. seems unnecessary to me.

155

I find this phrase above pretty confusing since we still have no clue what types refers to exactly, and the "identifier" may be confused with the previous one (the instance level one).

157

To make it easier to understand, maybe add something like "(for example configurations to several swh-storage services, typically "prod' and 'staging')"

161

I'm still very reluctant to the singleton term here: it has a very specific meaning for everyone doing code, and we are not using it to describe something related to this meaning).

164

a bit confusing also: above is written that instance contains attributes, now we say it an arbitrary nested structure... I know both these declarations are actually correct, but it is confusing. It should be said that attribute values can be any structure or something similar, to match the way this is presented a few lines before.

168

It would be much easier to read with a example snippet

178

this "double definition" of the id looks wrong to me: the second one is actually the declaration of type_id and friends...
ABNF says:

rule = definition ;

I'd rather stick to a proper ABNF notation actually.

186

grammar definitions are usually presented the other way around (from config_tree to id here)

190

Very surprised we reach this point without telling the type ID for so called singletons is _... Introducing this earlier would make these special cases 'aparté' for singleton (mostly?) unnecessary.

210

this is very confusing

224

Not sure this is the moment to describe implementation details like this here...

This revision now requires changes to proceed.Jan 5 2021, 2:42 PM

@douardda: taking into account your suggestions.

As for "singleton", we still have to agree to a better term.
There was initially "ad-hoc object", vlorentz suggested "literal dict" (which I would prefer as "literal object").

vlorentz raised another important question, which I find difficult to answer definitively by myself:
What is the benefit of having our custom reference system (vs. the cost of the extra complexity)?
See supra my answer to vlorentz.

Aside from that, what do you think of the parts before language description? Is the rationale better written-up or in notes (less verbose, but would need update to match information contained in the written-up version)?

  • Take into account feedback from ardumont and vlorentz
  • Take into account feedback from douardda
  • still TODO: rename "singletons"

Build is green

Patch application report for D4731 (id=17104)

Rebasing onto 40018f236e...

Current branch diff-target is up to date.
Changes applied before test
commit d33457e3ab4b4d7e042bec1e6979dd94b06a563e
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Jan 8 17:17:42 2021 +0100

    Take into account feedback from douardda
    
    To be squashed.
    
    - compress the syntactic overview, notably removing mostly redundant
    grammar elements wrt YAML
    - remove or rephrase confusing elements about types, instances and singletons

commit 05afea666d37fd66fbbf26a97ef2a0afdfab43c1
Author: tenma <tenma+swh@mailbox.org>
Date:   Tue Dec 22 13:10:58 2020 +0100

    Take into account feedback from ardumont and vlorentz
    
    To be squashed.
    
    - do not overly abbreviate some terms and be more consistent
    - update remark about existing config handling
    - be more explicit on the relation to YAML, notably regarding references
    - rephrase presentation of singletons
    - be more explicit on what the new system provides (updates the
    rationale which aims to present that)

commit 6f7015b087dcad0799b113882f8d22b549004bc9
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Dec 11 19:19:01 2020 +0100

    WIP Configuration system

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/190/ for more details.

tenma marked 12 inline comments as done.Jan 26 2021, 6:45 PM

Rename "singleton" to "record"

Build is green

Patch application report for D4731 (id=17646)

Rebasing onto 76d2a0033f...

Current branch diff-target is up to date.
Changes applied before test
commit 68dae3cf8a71b898f570501ed1721b0f692d1608
Author: tenma <tenma+swh@mailbox.org>
Date:   Tue Jan 26 18:46:53 2021 +0100

    config.spec: rename "singleton" to "record"

commit ff15871a628a0ab5194295562c27798f3d16d6cc
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Jan 8 17:17:42 2021 +0100

    Take into account feedback from douardda
    
    To be squashed.
    
    - compress the syntactic overview, notably removing mostly redundant
    grammar elements wrt YAML
    - remove or rephrase confusing elements about types, instances and singletons

commit 05bfcdc5e89f6ade6383eea0b290447689ddfe14
Author: tenma <tenma+swh@mailbox.org>
Date:   Tue Dec 22 13:10:58 2020 +0100

    Take into account feedback from ardumont and vlorentz
    
    To be squashed.
    
    - do not overly abbreviate some terms and be more consistent
    - update remark about existing config handling
    - be more explicit on the relation to YAML, notably regarding references
    - rephrase presentation of singletons
    - be more explicit on what the new system provides (updates the
    rationale which aims to present that)

commit 637b81f317563a2803beae989413f0847397fdde
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Dec 11 19:19:01 2020 +0100

    WIP Configuration system

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/191/ for more details.

docs/config.md
67

lacks

68

component

72

configurations

99

please, drop those ^ ?

144

duplicated

docs/config.md
280

instantiated

305

cumbersome

405

I'd suggest to make it raise with an explicit failure message.
So it fails early enough. That way, we have a chance to prevent
something going wrong because of misconfiguration.

Do we need to enter into this much details though?

  • config.spec: Submit first draft
  • config.spec: Adress remarks from @ardumont, @vlorentz
  • config.spec: Address remarks from @douardda
  • config.spec: Rename "singleton" to "record"
  • config.spec: Address @ardumont remarks, strip down some sections

Build is green

Patch application report for D4731 (id=17994)

Rebasing onto 76d2a0033f...

Current branch diff-target is up to date.
Changes applied before test
commit 99850ebb5d097dc6593096c874974ec02ecb0056
Author: tenma <tenma+swh@mailbox.org>
Date:   Tue Feb 9 11:57:34 2021 +0100

    config.spec: Address @ardumont remarks, strip down some sections

commit 930bb7ede25bdd70f2d84b3e1507097d7e587514
Author: tenma <tenma+swh@mailbox.org>
Date:   Tue Jan 26 18:46:53 2021 +0100

    config.spec: Rename "singleton" to "record"

commit 627a0e1e0f77ddd54427fce6ab442e477be8cff3
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Jan 8 17:17:42 2021 +0100

    config.spec: Address remarks from @douardda
    
    - compress the syntactic overview, notably removing mostly redundant
    grammar elements wrt YAML
    - remove or rephrase confusing elements about types, instances and singletons

commit c3ed1e8b999a51b1e976364a81f0a276e5ec24d3
Author: tenma <tenma+swh@mailbox.org>
Date:   Tue Dec 22 13:10:58 2020 +0100

    config.spec: Adress remarks from @ardumont, @vlorentz
    
    - do not overly abbreviate some terms and be more consistent
    - update remark about existing config handling
    - be more explicit on the relation to YAML, notably regarding references
    - rephrase presentation of singletons
    - be more explicit on what the new system provides (updates the
    rationale which aims to present that)

commit e3c989760fe31bae378a525340c8dfeb8c294cfb
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Dec 11 19:19:01 2020 +0100

    config.spec: Submit first draft

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/194/ for more details.

Build is green

Patch application report for D4731 (id=18023)

Rebasing onto 76d2a0033f...

Current branch diff-target is up to date.
Changes applied before test
commit 75c27e937f15472caa6db7c34750dc52e4cede31
Author: tenma <tenma+swh@mailbox.org>
Date:   Tue Feb 9 11:57:34 2021 +0100

    config.spec: Address @ardumont remarks, strip down some sections

commit 930bb7ede25bdd70f2d84b3e1507097d7e587514
Author: tenma <tenma+swh@mailbox.org>
Date:   Tue Jan 26 18:46:53 2021 +0100

    config.spec: Rename "singleton" to "record"

commit 627a0e1e0f77ddd54427fce6ab442e477be8cff3
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Jan 8 17:17:42 2021 +0100

    config.spec: Address remarks from @douardda
    
    - compress the syntactic overview, notably removing mostly redundant
    grammar elements wrt YAML
    - remove or rephrase confusing elements about types, instances and singletons

commit c3ed1e8b999a51b1e976364a81f0a276e5ec24d3
Author: tenma <tenma+swh@mailbox.org>
Date:   Tue Dec 22 13:10:58 2020 +0100

    config.spec: Adress remarks from @ardumont, @vlorentz
    
    - do not overly abbreviate some terms and be more consistent
    - update remark about existing config handling
    - be more explicit on the relation to YAML, notably regarding references
    - rephrase presentation of singletons
    - be more explicit on what the new system provides (updates the
    rationale which aims to present that)

commit e3c989760fe31bae378a525340c8dfeb8c294cfb
Author: tenma <tenma+swh@mailbox.org>
Date:   Fri Dec 11 19:19:01 2020 +0100

    config.spec: Submit first draft

See https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/195/ for more details.