Page MenuHomeSoftware Heritage

consistently document the configuration option of each module
Open, NormalPublic

Description

[ forked off T917 ]

While we are at it, it is nowadays probably pointless to document configuration in README. Rather, we should consistently store configuration information in the python documentation of each module, so that we can cross-reference from one module to another. For instance, instead of (re-)documenting the storage configuration requirements in the Git loader module, we should just point to the corresponding configuration documentation of the storage module.

How about a top-level "configuration" document in each module?

Event Timeline

zack triaged this task as Normal priority.May 25 2019, 5:14 PM
zack created this task.
twitu added a subscriber: twitu.EditedJun 29 2019, 2:41 PM

This is related to T1388.

I found configuration related information in dev-info.rst in two modules namely swh-indexer and swh-deposit. The dev-info.rst has Development configuration and Production configuration. Is the goal of this task to insert both these sections in a separate "configuration" document?

twitu added a comment.Jul 2 2019, 12:55 PM

I can begin working on it, once I understand what is required. Is my interpretation of the task correct?

zack added a subscriber: douardda.EditedJul 2 2019, 1:00 PM
In T1758#34452, @twitu wrote:

This is related to T1388.

I'll let @douardda confirm, but I think in fact they're just duplicate of each other.

I found configuration related information in dev-info.rst in two modules namely swh-indexer and swh-deposit. The dev-info.rst has Development configuration and Production configuration. Is the goal of this task to insert both these sections in a separate "configuration" document?

Yes, that's the idea: a top-level configuration.rst Sphinx file for each module, containing reference documentation listing all available options, their meaning, as well as the configuration file name.

twitu added a comment.Jul 2 2019, 4:45 PM

I looked at how the docs look after being built, for e.g. take swh-indexer at https://docs.softwareheritage.org/devel/swh-indexer/dev-info.html. It seems like the configuration information along with instructions to run and test it are best suited to this page. Have you considered adding comments about configuration parameters in this page itself, rather than making a top level file, because because only someone hacking on swh-indexer would be interested in the configuration.

For most modules like swh-loader-git with no dev-info.rst, a top level configuration.rst linking to other documents will be great.

For adding the comments to explain each parameter, I will go through the source code to understand the usage. However I do not know which module can refer to which apart from the given example.

zack added a comment.Jul 2 2019, 5:50 PM

@twitu: there are two different types of documents that should not be conflated together.

  • dev-info is for sample configurations that will enable a contributor to get started hacking on the code.
  • configuration.rst will be about, as I wrote in my previous message, complete reference information listing all configuration options.

One is not the substitute for the other and vice-versa.

This task is about configuration.rst.

twitu added a comment.Jul 2 2019, 7:36 PM

After re-reading the documentation, I realized that the configuration files given in swh-indexer is actually used by swh-scheduler and swh-storage, indicating that these two modules are at the root of the dependency tree.

However after exploring the code base for all the three modules, I can't find any mention of the orchestrator.yml or mimetype.yml in the code. Where are these files being read? Can you give me an entry point in the code, so I will be able to understand the how the parameters are being used?

zack added subscribers: vlorentz, ardumont.EditedJul 2 2019, 10:24 PM
In T1758#34595, @twitu wrote:

However after exploring the code base for all the three modules, I can't find any mention of the orchestrator.yml or mimetype.yml in the code. Where are these files being read? Can you give me an entry point in the code, so I will be able to understand the how the parameters are being used?

this is for @vlorentz and/or @ardumont

(I couldn't find them with rgrep/ack either, but it might need a more sophisticated regexp than what I've used)

mimetype.yml's content is used in swh/indexer/mimetype.py, and orchestrator.yml is no longer used.