Description

This task is dedicated to start a discussion on the opportunity to replace part/all our (celery) task-based architecture for handling background processing of the archive.

First come in mind replacing the task-based indexers by (kafka) journal consumer ones. But we should consider the whole scheduler/celery/task model as well.

Also, we should not stick to kafka for now. Let's have in mind all the reasonable alternatives, with pros/cons of each one of them.

Related Objects
Search...

Status	Assigned	Task
Migrated	gitlab-migration	T2217 Plumbings
Migrated	gitlab-migration	T2218 Orchestration
Migrated	gitlab-migration	T2063 Investigate stream-based execution model as a replacement for current scheduler/celery/task based approach
Migrated	gitlab-migration	T1521 Explore Faust as a possible kafka-based replacement for celery

Event Timeline

douardda created this task.Nov 6 2019, 10:09 AM

A few notes

NATS

NATS is an open-source messaging system written in Go. In the context of this discussion, we want to focus on NATS Streaming, which implements some/most of the basic requirements we need.

See for example:

https://storageos.com/nats-good-gotchas-awesome-features/
https://danielwertheim.se/nats-what-a-beautiful-protocol/

There is also a bunch of other tools using NATS as backbone:

nRPC: nRPC is an RPC framework like gRPC, but for NATS.
Xbus is a high-level application messaging on top of NATS (written mostly by Christophe Devienne whom I know quite well).

douardda mentioned this in T1547: Scheduler runner is slow to run tasks.Nov 6 2019, 10:30 AM

Some related work : https://faust.readthedocs.io/en/latest/ (a Python stream processing environment inspired by KafkaStreams)

zack triaged this task as Normal priority.Nov 6 2019, 1:36 PM

zack added a project: Scheduling utilities.

vlorentz added a subtask: T1521: Explore Faust as a possible kafka-based replacement for celery.Nov 6 2019, 3:47 PM

douardda mentioned this in T2218: Orchestration.Jan 20 2020, 2:27 PM

vlorentz added a parent task: T2218: Orchestration.Jan 22 2020, 4:37 PM

vlorentz mentioned this in T2073: Index extrinsic metadata from the journal in swh-search/Elasticsearch.Sep 10 2021, 4:05 PM

This task has been migrated to GitLab.

Investigate stream-based execution model as a replacement for current scheduler/celery/task based approachClosed, MigratedEdits LockedActions

Description

Related ObjectsSearch...

Event Timeline

Investigate stream-based execution model as a replacement for current scheduler/celery/task based approach
Closed, MigratedEdits Locked
Actions

Related Objects
Search...