Page MenuHomeSoftware Heritage

plugins: add support for scheduler's task-type declaration
ClosedPublic

Authored by douardda on Sep 3 2019, 3:15 PM.

Details

Summary

By default, the db-init cli tool will now create missing task-type entries in the
scheduler according to:

  • only create missing task-types (do not update them), but check that the backend_name field is consistent,
  • each SWHTask-based task declared in a module listed in the 'task_modules' plugin registry field will be checked and added if needed; tasks which name start wit an underscore will not be added,
  • added task-type will have:
    • the 'type' field is derived from the task's function name (with underscores replaced with dashes),
    • the description field is the first line of that function's docstring,
    • default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with a simple pattern matching to have decent default values for full/incremental tasks),
    • these default values can be overloaded via the 'task_type' plugin registry entry.

For this, we had to rename all tasks names (eg. cran_lister -> list_cran).

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vlorentz added inline comments.
swh/lister/npm/__init__.py
12–18

why a specific value for npm?
There should be at least a comment about this.

ardumont added inline comments.
swh/lister/cli.py
139

I like this but should this be called in the db-init subcommand?

That will work for the docker-dev environment but what about production?

144

key does not convey much ;)
I had to scratch my head a bit to understand it was the lister's "listing property" (full, incremental)...
I don't know how to call it though.

swh/lister/npm/__init__.py
12–18

Probably because that matches the current production setup.

               type                |                         description                          |                     backend_name                      | default_interval | min_interval | max_interval | backoff_factor | max_queue_length$
-----------------------------------+--------------------------------------------------------------+-------------------------------------------------------+------------------+--------------+--------------+----------------+-----------------$
 list-npm-full                     | Full npm lister                                              | swh.lister.npm.tasks.NpmListerTask                    | 7 days           | 7 days       | 7 days       |              1 |                 $

Thanks /me likes!

I just have an issue about how we integrate this for production (we might not change a thing on our m.o., i'm just unsure ;)

swh/lister/npm/__init__.py
12–18

And why was it done in production?

swh/lister/npm/__init__.py
12–18

I don't really know (i must have kept the existing ones which we now find somehow consistently in the scheduling db).

Same goes for the existing other setup, why 90 (list-full), 64 (loaders), 1 (list-incremental) day(s)?
That probably sounded reasonable at some point in time.

swh/lister/npm/__init__.py
12–18

Probably because that matches the current production setup.

yep!

swh/lister/cli.py
139

I am not yet sure where I want for this new feature/command to live.

It might make sense to make it a dedicated command indeed (possibly called from db-init).

For the production system, I guess one would prefer this 'upgrade' task to be done by hand, so I won't make it executed automagically somehow.

144

agreed, could use something like pattern (but it's not a regex) of suffix maybe?

swh/lister/cli.py
139

It might make sense to make it a dedicated command indeed (possibly called from db-init).

Yes (i thought of that \m/ ;)

subcommand: register-task or something?

For the production system, I guess one would prefer this 'upgrade' task to be done by hand, so I won't make it executed automagically somehow.

yes, i'd prefer that the call be manual (at least at first ;)

(Also, you made it idempotent already so that should be fine to use)

144

yes, suffix sounds good.

Might be adding a comment to explicit the suffix lives in a fixed set of {'full', 'incremental'} (as of now ;)

To be clear, i'm fine with the diff's goal and how it's implemented.

It'd be great to have what we exchanged about but that could be iterated over in another diff.

Also, the ci build failure seems to be irrelevant to the diff (patch appliance failure so meh, maybe a missing rebase or some such).

This revision is now accepted and ready to land.Sep 3 2019, 5:11 PM

attempt to answer ardumont and vlorents queries/comments

ardumont added inline comments.
swh/lister/cli.py
141

According to

163

you could make that a """ docstring """ ;)

refactor a bit the ensure_task_type function and write a docstring

and fix a typo repported by ardumont

Add tests, and (as a consequence) a fix

This revision was automatically updated to reflect the committed changes.