Indexers: Make orchestrators use swh-scheduler for scheduling
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	ardumont
	Oct 3 2018, 12:04 PM

Description

At the moment, the orchestrator 'schedule' directly tasks...

Related Objects
Search...

Status	Assigned	Task
Migrated	gitlab-migration	T439 Indexers: compute (and maintain up-to-date) the filetype of all blobs
Migrated	gitlab-migration	T1385 Monitor output of metadata indexers
Migrated	gitlab-migration	T359 Indexers: batch content analyzer infrastructure
Migrated	gitlab-migration	T1227 General improvments of the indexer: Schedule indexer tasks
Migrated	gitlab-migration	T1229 Indexers: Make orchestrators use swh-scheduler for scheduling
Migrated	gitlab-migration	T1290 Indexers: Use swh.scheduler instead of directly relying on Celery

Event Timeline

ardumont triaged this task as Normal priority.Oct 3 2018, 12:04 PM

ardumont created this task.

vlorentz claimed this task.Oct 22 2018, 2:05 PM

@vlorentz @olasd Beware though, the orchestrator is supposed to per indexer (as in its configuration, e.g. P235):

receive data (input)
split it (according to indexer configuration)
optionally filter it (according to indexer configuration)
forward the result (objects to index) to indexer

That's why there is still analysis needed.
I'm not entirely sure we can remove it.

A more middleground approach could be to change its implementation to remove the immediate message sending to the indexers ("immediate scheduling" of sort).
Instead, lift the scheduling api to add new tasks in the scheduler.
And let the scheduler do its job (schedule ;)

My plan is to move the logic from the orchestrator into the indexers themselves

My plan is to move the logic from the orchestrator into the indexers themselves

I do not think that's a reasonable approach (it's currently running in production and it works, i'd like to keep it that way ;).

The idea of the intermediate layer (orchestrator) is to be flexible in configuration without changing runtime code.
We want to keep that configuration flexibility.

Currently, if we want to add or remove an indexer, we can just change the configuration and no new deployment is needed (well, except for new indexer code, but the rest stands, we just need to change the right orchestrator's configuration).

Indexers can be made configurable as well

Indexers can be made configurable as well

Yes and they are already, only for what they are supposed to do, indexing.
With what you propose, I'm afraid to lose the existing separation of concern logic.

vlorentz removed vlorentz as the assignee of this task.Oct 25 2018, 3:58 PM

vlorentz added a subscriber: vlorentz.

ardumont renamed this task from Indexers: Analysis further the possiblity to remove the orchestrator layer to the benefit of the scheduler to Indexers: Make orchestrators use swh-scheduler for scheduling.Oct 26 2018, 8:05 PM

ardumont updated the task description. (Show Details)

So this task will be solved by D606, right?

vlorentz closed subtask T1290: Indexers: Use swh.scheduler instead of directly relying on Celery as Resolved.Oct 29 2018, 9:55 AM

Yes, it should.

ardumont mentioned this in rSPSITE7d26a68fba40: data/defaults: Indexer needs a scheduler configured now.Oct 29 2018, 10:59 AM

So in the end, we will remove the orchestrator anyway.
That will simplify the indexer scheduling (which could not really happen easily in the current state).

So opening a new task for it.

ardumont mentioned this in rDSNIPdd47f4556785: volatile-scheduler: indexer no longer needs multiple checks.Nov 22 2018, 7:31 PM

This task has been migrated to GitLab.

gitlab-migration changed the status of subtask T1290: Indexers: Use swh.scheduler instead of directly relying on Celery from Resolved to Migrated.Jan 8 2023, 9:58 PM

Indexers: Make orchestrators use swh-scheduler for schedulingClosed, MigratedEdits LockedActions

Description

Related ObjectsSearch...

Event Timeline

Indexers: Make orchestrators use swh-scheduler for scheduling
Closed, MigratedEdits Locked
Actions

Related Objects
Search...