Page MenuHomeSoftware Heritage

Investigate stream-based execution model as a replacement for current scheduler/celery/task based approach
Open, NormalPublic

Description

This task is dedicated to start a discussion on the opportunity to replace part/all our (celery) task-based architecture for handling background processing of the archive.

First come in mind replacing the task-based indexers by (kafka) journal consumer ones. But we should consider the whole scheduler/celery/task model as well.

Also, we should not stick to kafka for now. Let's have in mind all the reasonable alternatives, with pros/cons of each one of them.

Event Timeline

douardda created this task.Nov 6 2019, 10:09 AM

A few notes

NATS

NATS is an open-source messaging system written in Go. In the context of this discussion, we want to focus on NATS Streaming, which implements some/most of the basic requirements we need.

See for example:

https://storageos.com/nats-good-gotchas-awesome-features/
https://danielwertheim.se/nats-what-a-beautiful-protocol/

There is also a bunch of other tools using NATS as backbone:

  • nRPC: nRPC is an RPC framework like gRPC, but for NATS.
  • Xbus is a high-level application messaging on top of NATS (written mostly by Christophe Devienne whom I know quite well).
olasd added a comment.Nov 6 2019, 1:33 PM

Some related work : https://faust.readthedocs.io/en/latest/ (a Python stream processing environment inspired by KafkaStreams)

zack triaged this task as Normal priority.Nov 6 2019, 1:36 PM
zack added a project: Scheduling utilities.