Page MenuHomeSoftware Heritage

Add get_storage_pipeline utility function for more readable configurations.
ClosedPublic

Authored by vlorentz on Nov 14 2019, 2:19 PM.

Details

Summary

This would allow writing configurations like:

storage:
  cls: pipeline
  steps:
    - cls: filter
    - cls: buffer
    - cls: remote
      url: http://swh-storage:5002/

or

storage:
  cls: filter
  storage:
    cls: buffer
    storage:
      cls: remote
      url: http://swh-storage:5002/

instead of:

storage:
  cls: filter
  args:
    storage:
      cls: buffer
      args:
        storage:
          cls: remote
          args:
            url: http://swh-storage:5002/

(example from D2275)

I think I gives more readable configs, but on the other hand it adds
more code to maintain. So I'm not sure if this is a good idea.

Comments welcome.

Diff Detail

Repository
rDSTO Storage manager
Branch
get_storage_pipeline
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 9110
Build 13321: tox-on-jenkinsJenkins
Build 13320: arc lint + arc unit

Event Timeline

I like the way that looks.

I guess the lowest friction way of wiring this up to the rest of our infra would be adding a 'pipeline' class to STORAGE_IMPLEMENTATION, which would make the top-level config look like

storage:
  cls: pipeline
  args:
    steps:
    - filter
    - buffer
    - cls: remote
      args:
        url: http://foo/

Which is a bit less terse, but has the advantage of being fully backwards-compatible.

What do you mean by "backwards-compatible"? get_storage_pipeline accepts a superset of get_storage's syntax.

I'm ok with either syntax though.

I'm not convinced this really is a good idea, it makes it difficult to document. Why not just replace get_xxx(cls, args): by get_xxx(cls, **args), then we can write this yaml as (with a little bit of manipulation on the yaml->arguments transformation):

storage:
  - filter
  - storage:
    - buffer
    - storage:
      - remote
      - url: http://swh-storage:5002/

or (with no under-the-carpet transformation):

storage:
  cls:filter
  storage:
    cls: buffer
    storage:
      cls: remote
      url: http://swh-storage:5002/

which looks a reasonable compromise between specific argument manipulation and yaml readability.

vlorentz edited the summary of this revision. (Show Details)

! In D2276#53517, @douardda wrote:
I'm not convinced this really is a good idea, it makes it difficult to document. Why not just replace get_xxx(cls, args): by get_xxx(cls, **args), then we can write this yaml as (with a little bit of manipulation on the yaml->arguments transformation):

storage:
  - filter
  - storage:
    - buffer
    - storage:
      - remote
      - url: http://swh-storage:5002/

I don't like this, notably for what is going to happen for classes with several arguments. Do I need to add a dash or not?

or (with no under-the-carpet transformation):

storage:
  cls:filter
  storage:
    cls: buffer
    storage:
      cls: remote
      url: http://swh-storage:5002/

which looks a reasonable compromise between specific argument manipulation and yaml readability.

I agree that we should make the **args implicit. We can even do it in a backwards compatible way by checking for args['args'] at the beginning of get_storage.

I still think there's *also* value in the pipeline proposal.

@douardda Your first proposition is essentially moving to positional arguments, which is step backward.

The second one looks indeed like a good compromise between code readability and config readability.

In D2276#53524, @olasd wrote:

I still think there's *also* value in the pipeline proposal.

By combining the two proposals, you can remove the special casing for strings in the pipeline steps and get to:

storage:
  cls: pipeline
  steps:
  - cls: filter
  - cls: buffer
  - cls: remote
    url: http://foo/

Which looks pretty neat, and fairly simple to understand/document to me

vlorentz edited the summary of this revision. (Show Details)

improve example in commit message

vlorentz edited the summary of this revision. (Show Details)

fix example in commit message

ardumont added a subscriber: ardumont.

And now we need to align this in other modules ;)

Thanks.

This revision is now accepted and ready to land.Nov 15 2019, 10:29 AM
This revision was landed with ongoing or failed builds.Nov 19 2019, 4:17 AM
This revision was automatically updated to reflect the committed changes.