Page MenuHomeSoftware Heritage

Automatic registration of tasks in the scheduler
Closed, MigratedEdits Locked

Description

Right now, registering a task type in the scheduler requires a manual insert request in the database of the scheduler. This is problematic because it makes local testing pretty hard. You need to import the task types from the production database, there aren't any default values that can be used easily. This is especially hard for newcomers that don't necessarily have access to the production database: they have to reverse-engineer the different task types and a useful default configuration, as we have no way to provide some default configuration for these tasks.

The proposed solution would be to have the workers automatically register the tasks they are able to execute with some default configuration/policy with sane defaults that are stored in the code.

When a worker starts, it will do an RPC request to the scheduler to say that it's able to execute a task with a given name, and will give its local default configuration. If the scheduler doesn't already have this task registered, it will register it with the given configuration, else it will just return the current configuration.

After the return of the RPC call, if the configuration returned by the scheduler doesn't match the configuration that was sent to it for the registration, we will issue a warning that the configuration "in code" doesn't match the one in database, and that a manual intervention is required. This is to avoid getting too much out of sync between the code and the default configuration, that should eventually tend to match.

This automatic registration allows us to keep the flexibility of being able to edit the policies of tasks that have already been registered without restarting the services, while allowing a "drop-in" scheduling mechanism that will make life of newcomers easier and writing new services much more elegant.

Event Timeline

seirl updated the task description. (Show Details)
seirl updated the task description. (Show Details)

That's good! Thanks for that.

Right now, registering a task type in the scheduler requires a manual insert request in the database of the scheduler. This is problematic because it makes local testing pretty hard. You need to import the task types from the production database, there aren't any default values that can be used easily. This is especially hard for newcomers that don't necessarily have access to the production database: they have to reverse-engineer the different task types and a useful default configuration, as we have no way to provide some default configuration for these tasks.

Well, since rDSCHf08d9e7b75c8cd93d640b527802e4508bca2a974 from last week, for the local db, when you use our make rebuild-testdata routine from swh-environment, this is run and insert the values in the sql/swh-data.sql (which is the way other dbs are set, at least for the main db).

That eases the local tests for my part (svn and deposit so far).

There is no vault values in that file yet, it's a commit away though.

After thinking a bit about it, i think there's an additional component to this whole thing that would be useful: a way to synchronize the database with the defaults.

The idea is this: if we're going to do the automatic synchronization thing, it probably means that all our workers will share a common entry point, to be able to do the registration without rewriting the specific code that handles the registration. We could therefore add a command line switch to this entry point (maybe --interactive or --noninteractive for services — i like having interactive as the default since that's what you want to use locally when you type the command manually). If the interactive mode is turned on, instead of issuing a warning if the defaults and the db are not synchronized, it will ask with an input() if you want to synchronize it right now or if you want to run the service anyway and solve the issue manually later. This will allow you not to have to wonder how to specifically update the task type to match the defaults. Thoughts?

@ardumont oh, interesting. Since it's for testdata, wouldn't it be simpler to use replace into here?

@ardumont oh, interesting. Since it's for testdata, wouldn't it be simpler to use replace into here?

Maybe, i don't know that instruction (and don't seem to find references to it for postgres ;)