Page MenuHomeSoftware Heritage

Find a way to properly open the kafka brokers to the internet
Started, Work in Progress, HighPublic


The kafka brokers need to be accessible from the internet, so our mirrors can subscribe to the topics and process messages.

We need to figure out:

  • frontend/proxying
  • TLS
  • authentication
  • authorization

For reference:

There's a strong chance that the journal code will need to be adapted to allow passing the proper settings to the kafka libraries.

Event Timeline

olasd triaged this task as High priority.Jun 18 2019, 4:02 PM
olasd created this task.
olasd changed the task status from Open to Work in Progress.Aug 23 2019, 6:45 PM

A new Kafka cluster has been spun up on azure virtual machines, with 6 machines each with 8TB of storage available.

A Kafka Mirror Maker has been setup by hand on getty to pull the data from the cluster in Rocquencourt to the cluster on Azure (only on the content topic for now).

My working theory for now is to use the Rocquencourt cluster as a low-latency buffer in front of the Azure cluster.

The next step is to lock this cluster down with SASL authentication, and to give it public ip addresses and TLS setup so it can be opened to the internet.

olasd added a comment.Aug 26 2019, 8:57 AM

The content topic has fully replicated to the new cluster over the weekend.

I've now added replication for the other topics.