Page MenuHomeSoftware Heritage

Generic scheduler task creation according to task type
Open, WishlistPublic

Description

As the scheduler api usage starts to grow, we need a way to generically create a task for a given type.

Each task type (listing, loading, cooking, checking...) has a different way of being called, creating multiple combinations.
This is difficult to track every form of them as this is currently dispatched among different modules.
That could be a pain to maintain in the long run.

As of today, this is "hard-coded" in:

  • swh-web (in progress) ~> will be multiple loading types (svn, hg, git, maybe more)
  • swh-updater (ghtorrent) ~> only 1 type so far (loading)
  • swh-deposit ~> only 2 types so far (checking, loading)
  • swh-vault ~> only 1 type so far (cooking)
  • swh-lister ~> loading types

I think that ideally, the client of the scheduler api should not be concerned by those...
Just call the creation api to schedule a task of a given tool_type (lister, loader, checker, cooker, ...), for a given origin type (git, svn, hg, tar, deposit, dir, ...), possibly a supplementary extra nature information (i see incremental and full so far, don't really know how to call this one).

I see one possible way of implementing this as this is somewhat already the case in the task_type table of the scheduler.
Reworking slighly the task_type table to have those new column should do it:

  • tool_type: lister, loader, checker, cooker, ...
  • origin_type: git, svn, hg, tar, deposit, dir, ...
  • tool_type_extra: incremental, full, none ...
  • template_arguments: json like structure for each type

The last one being the important one permitting the task creation templating without bothering the scheduler api client.

Or something.

Event Timeline

ardumont triaged this task as Wishlist priority.Jul 20 2018, 11:58 AM
ardumont created this task.
ardumont renamed this task from Generci scheduler task creation according to task type to Generic scheduler task creation according to task type.Jul 20 2018, 12:00 PM

whiteboard session with anlambert: F3219764

Schema legend:

  • upper left corner: schema model upgrade (new column task, scope).
  • center: actual module representation and their dependencies regarding the scheduler

That actual solution would also permit to add checks on the task creation input (we do not have any today and if we had, we would actually possibly duplicate those).
Well, with wrong inputs, a postgres db exception would probably leak without a clear message of what the actual error is.