Page MenuHomeSoftware Heritage

staging/journal: Declare a new kafka node to migrate journal0
ClosedPublic

Authored by vsellier on Oct 14 2021, 4:07 PM.

Details

Summary

This is the first step of 3, next ones will be:

  • Migrate the data from journal0 to storage1 with the kafka-reassign-partitions.sh command
  • The second one will be to remove journal0 references in the kafka/zookeeper configurations

Related to T3630

Test Plan
  • journal0:
diff origin/production/journal0.internal.staging.swh.network current/journal0.internal.staging.swh.network
*******************************************
  Archive[/var/tmp/kafka/kafka_2.13-2.6.0.tgz] =>
   parameters =>
     source =>
      - https://mirrors.ircam.fr/pub/apache/kafka/2.6.0/kafka_2.13-2.6.0.tgz
      + https://archive.apache.org/dist/kafka/2.6.0/kafka_2.13-2.6.0.tgz
*******************************************
  File[/etc/default/prometheus-kafka-consumer-group-exporter/rocquencourt_staging] =>
   parameters =>
     content =>
      @@ -4,4 +4,4 @@
       # changes will be lost
      _
      -BOOTSTRAP_SERVERS=journal0.internal.staging.swh.network
      +BOOTSTRAP_SERVERS=journal0.internal.staging.swh.network,storage1.internal.staging.swh.network
       PORT=9208
*******************************************
  File[/etc/zookeeper/conf/zoo.cfg] =>
   parameters =>
     content =>
      @@ -28,4 +28,5 @@
       #server.3=zookeeper3:2888:3888
       server.1=journal0.internal.staging.swh.network:2888:3888
      +server.2=storage1.internal.staging.swh.network:2888:3888
      _
       # To avoid seeks ZooKeeper allocates space in the transaction log file in
*******************************************
  File[/opt/kafka/config/server.properties] =>
   parameters =>
     content =>
      @@ -25,5 +25,5 @@
       ssl.keystore.location=/opt/kafka/config/broker.ks
       ssl.keystore.password=9KJKfG1QhZUJHL2s
      -super.users=User:broker-journal0.internal.staging.swh.network;User:swh-admin-olasd;User:ANONYMOUS
      -zookeeper.connect=journal0.internal.staging.swh.network:2181/kafka/softwareheritage
      +super.users=User:broker-journal0.internal.staging.swh.network;User:broker-storage1.internal.staging.swh.network;User:swh-admin-olasd;User:ANONYMOUS
      +zookeeper.connect=journal0.internal.staging.swh.network:2181,storage1.internal.staging.swh.network:2181/kafka/softwareheritage
       zookeeper.session.timeout.ms=18000
*******************************************
*** End octocatalog-diff on journal0.internal.staging.swh.network
  • storage1:
diff origin/production/storage1.internal.staging.swh.network current/storage1.internal.staging.swh.network
*******************************************
+ Anchor[java::begin:]
*******************************************
+ Anchor[java::end]
*******************************************
+ Anchor[zookeeper::end]
*******************************************
+ Anchor[zookeeper::install::begin]
*******************************************
+ Anchor[zookeeper::install::end]
*******************************************
+ Anchor[zookeeper::install::intermediate]
*******************************************
+ Anchor[zookeeper::start]
*******************************************
+ Archive[/var/tmp/kafka/kafka_2.13-2.6.0.tgz] =>
   parameters =>
     "cleanup": true,
     "creates": "/opt/kafka-2.13-2.6.0/config",
     "ensure": "present",
     "extract": true,
     "extract_command": "tar xfz %s --strip-components=1",
     "extract_path": "/opt/kafka-2.13-2.6.0",
     "group": "kafka",
     "source": "https://archive.apache.org/dist/kafka/2.6.0/kafka_2.13-2.6.0.tgz"...
     "user": "kafka"
*******************************************
+ Concat_file[profile::cron::kafka] =>
   parameters =>
     "group": "root",
     "mode": "0644",
     "owner": "root",
     "path": "/etc/puppet-cron.d/kafka",
     "tag": "profile::cron::kafka"
*******************************************
+ Concat_fragment[profile::cron::kafka-purge-logs] =>
   parameters =>
     "content": "# Cron snippet kafka-purge-logs\n33 2 * * * root find /var/log/k...
     "order": "10",
     "tag": "profile::cron::kafka",
     "target": "profile::cron::kafka"
*******************************************
+ Concat_fragment[profile::cron::kafka-zip-logs] =>
   parameters =>
     "content": "# Cron snippet kafka-zip-logs\n29 3 * * * root find /var/log/kaf...
     "order": "10",
     "tag": "profile::cron::kafka",
     "target": "profile::cron::kafka"
*******************************************
+ Concat_fragment[profile::cron::kafka::_header] =>
   parameters =>
     "content": "# Managed by puppet (module profile::cron), manual changes will ...
     "order": "00",
     "tag": "profile::cron::kafka",
     "target": "profile::cron::kafka"
*******************************************
+ Exec[create /srv/kafka/logdir] =>
   parameters =>
     "command": "mkdir -p /srv/kafka/logdir",
     "creates": "/srv/kafka/logdir",
     "path": [
       "/bin",
       "/usr/bin",
       "/sbin",
       "/usr/sbin"
     ]
*******************************************
+ Exec[kafka-reload-tls:EXTERNAL] =>
   parameters =>
     "command": "/opt/kafka/bin/kafka-configs.sh --bootstrap-server storage1.inte...
     "refreshonly": true
*******************************************
+ Exec[kafka-reload-tls:INTERNAL] =>
   parameters =>
     "command": "/opt/kafka/bin/kafka-configs.sh --bootstrap-server storage1.inte...
     "refreshonly": true
*******************************************
+ Exec[update-java-alternatives] =>
   parameters =>
     "command": "update-java-alternatives --set java-1.11.0-openjdk-amd64 --jre-h...
     "path": "/usr/bin:/usr/sbin:/bin:/sbin",
     "unless": "test /etc/alternatives/java -ef '/usr/lib/jvm/java-1.11.0-openjdk...
*******************************************
+ File[/etc/cron.d/puppet-kafka] =>
   parameters =>
     "ensure": "link",
     "target": "/etc/puppet-cron.d/kafka"
*******************************************
+ File[/etc/init.d/kafka] =>
   parameters =>
     "ensure": "absent"
*******************************************
  File[/etc/softwareheritage/journal/backfill.yml] =>
   parameters =>
     content =>
      @@ -10,4 +10,5 @@
           brokers:
           - journal0.internal.staging.swh.network
      +    - storage1.internal.staging.swh.network
           prefix: swh.journal.objects
           client_id: swh.storage.journal_writer.storage1
*******************************************
  File[/etc/softwareheritage/storage/indexer.yml] =>
   parameters =>
     content =>
      @@ -8,4 +8,5 @@
           brokers:
           - journal0.internal.staging.swh.network
      +    - storage1.internal.staging.swh.network
           prefix: swh.journal.indexed
           client_id: swh.idx_storage.journal_writer.storage1
*******************************************
  File[/etc/softwareheritage/storage/storage.yml] =>
   parameters =>
     content =>
      @@ -14,4 +14,5 @@
               brokers:
               - journal0.internal.staging.swh.network
      +        - storage1.internal.staging.swh.network
               prefix: swh.journal.objects
               client_id: swh.storage.journal_writer.storage1
*******************************************
+ File[/etc/ssl/certs/letsencrypt/storage1.internal.staging.swh.network/cert.pem] =>
   parameters =>
     "ensure": "present",
     "group": "root",
     "mode": "0644",
     "owner": "root",
     "source": "puppet:///le_certs/storage1.internal.staging.swh.network/cert.pem...
*******************************************
+ File[/etc/ssl/certs/letsencrypt/storage1.internal.staging.swh.network/chain.pem] =>
   parameters =>
     "ensure": "present",
     "group": "root",
     "mode": "0644",
     "owner": "root",
     "source": "puppet:///le_certs/storage1.internal.staging.swh.network/chain.pe...
*******************************************
+ File[/etc/ssl/certs/letsencrypt/storage1.internal.staging.swh.network/fullchain.pem] =>
   parameters =>
     "ensure": "present",
     "group": "root",
     "mode": "0644",
     "owner": "root",
     "source": "puppet:///le_certs/storage1.internal.staging.swh.network/fullchai...
*******************************************
+ File[/etc/ssl/certs/letsencrypt/storage1.internal.staging.swh.network/privkey.pem] =>
   parameters =>
     "ensure": "present",
     "group": "root",
     "mode": "0600",
     "owner": "root",
     "source": "puppet:///le_certs/storage1.internal.staging.swh.network/privkey....
*******************************************
+ File[/etc/ssl/certs/letsencrypt/storage1.internal.staging.swh.network] =>
   parameters =>
     "ensure": "directory",
     "group": "root",
     "mode": "0755",
     "owner": "root"
*******************************************
+ File[/etc/ssl/certs/letsencrypt] =>
   parameters =>
     "ensure": "directory",
     "group": "root",
     "mode": "0755",
     "owner": "root",
     "purge": true,
     "recurse": true
*******************************************
+ File[/etc/systemd/system/kafka.service.d/exitcode.conf] =>
   parameters =>
     "content": "[Service]\nSuccessExitStatus=143\n",
     "ensure": "file",
     "group": "root",
     "mode": "0444",
     "notify": [
       "Class[Systemd::Systemctl::Daemon_reload]"
     ],
     "owner": "root",
     "selinux_ignore_defaults": false,
     "show_diff": true
*******************************************
+ File[/etc/systemd/system/kafka.service.d/restart.conf] =>
   parameters =>
     "content": "[Service]\nRestart=on-failure\nRestartSec=5\n",
     "ensure": "file",
     "group": "root",
     "mode": "0444",
     "notify": [
       "Class[Systemd::Systemctl::Daemon_reload]"
     ],
     "owner": "root",
     "selinux_ignore_defaults": false,
     "show_diff": true
*******************************************
+ File[/etc/systemd/system/kafka.service.d/stop-timeout.conf] =>
   parameters =>
     "content": "[Service]\nTimeoutStopSec=infinity\n",
     "ensure": "file",
     "group": "root",
     "mode": "0444",
     "notify": [
       "Class[Systemd::Systemctl::Daemon_reload]"
     ],
     "owner": "root",
     "selinux_ignore_defaults": false,
     "show_diff": true
*******************************************
+ File[/etc/systemd/system/kafka.service.d] =>
   parameters =>
     "ensure": "directory",
     "group": "root",
     "owner": "root",
     "purge": true,
     "recurse": true,
     "selinux_ignore_defaults": false
*******************************************
+ File[/etc/systemd/system/kafka.service] =>
   parameters =>
     "content": "[Unit]\nDescription=Apache Kafka server (broker)\nDocumentation=...
     "ensure": "file",
     "mode": "0644",
     "notify": [
       "Exec[systemctl-daemon-reload]"
     ]
*******************************************
+ File[/etc/zookeeper/conf/environment] =>
   parameters =>
     "content": "NAME=zookeeper\nZOOCFGDIR=/etc/zookeeper/conf\n\n# TODO this is ...
     "group": "zookeeper",
     "mode": "0644",
     "notify": [
       "Service[zookeeper]"
     ],
     "owner": "zookeeper"
*******************************************
+ File[/etc/zookeeper/conf/log4j.properties] =>
   parameters =>
     "content": "# Copyright 2012 The Apache Software Foundation\n#\n# Licensed t...
     "group": "zookeeper",
     "mode": "0644",
     "notify": [
       "Service[zookeeper]"
     ],
     "owner": "zookeeper"
*******************************************
+ File[/etc/zookeeper/conf/myid] =>
   parameters =>
     "content": "2\n",
     "ensure": "file",
     "group": "zookeeper",
     "mode": "0644",
     "notify": [
       "Service[zookeeper]"
     ],
     "owner": "zookeeper"
*******************************************
+ File[/etc/zookeeper/conf/zoo.cfg] =>
   parameters =>
     "content": "# http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin...
     "group": "zookeeper",
     "mode": "0644",
     "notify": [
       "Service[zookeeper]"
     ],
     "owner": "zookeeper"
*******************************************
+ File[/etc/zookeeper/conf] =>
   parameters =>
     "ensure": "directory",
     "group": "zookeeper",
     "mode": "0644",
     "owner": "zookeeper",
     "recurse": true
*******************************************
+ File[/opt/kafka-2.13-2.6.0] =>
   parameters =>
     "ensure": "directory",
     "group": "kafka",
     "mode": "0755",
     "owner": "kafka"
*******************************************
+ File[/opt/kafka/config/server.properties] =>
   parameters =>
     "content": "#\n# Note: This file is managed by Puppet.\n#\n# See: http://kaf...
     "ensure": "present",
     "group": "kafka",
     "mode": "0644",
     "notify": "Service[kafka]",
     "owner": "root"
*******************************************
+ File[/opt/kafka/config] =>
   parameters =>
     "ensure": "directory",
     "group": "root",
     "owner": "root"
*******************************************
+ File[/opt/kafka] =>
   parameters =>
     "ensure": "link",
     "target": "/opt/kafka-2.13-2.6.0"
*******************************************
+ File[/opt/prometheus-jmx-exporter/jmx_prometheus_javaagent-0.11.0.jar] =>
   parameters =>
     "ensure": "present",
     "group": "root",
     "owner": "root",
     "source": "https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_j...
*******************************************
+ File[/opt/prometheus-jmx-exporter/kafka.yml] =>
   parameters =>
     "content": "# Fetched from https://github.com/prometheus/jmx_exporter exampl...
     "group": "root",
     "mode": "0644",
     "owner": "root"
*******************************************
+ File[/opt/prometheus-jmx-exporter] =>
   parameters =>
     "ensure": "directory",
     "group": "root",
     "mode": "0644",
     "owner": "root"
*******************************************
+ File[/srv/kafka/logdir] =>
   parameters =>
     "ensure": "directory",
     "group": "kafka",
     "mode": "0750",
     "owner": "kafka"
*******************************************
+ File[/var/lib/zookeeper/myid] =>
   parameters =>
     "ensure": "link",
     "target": "/etc/zookeeper/conf/myid"
*******************************************
+ File[/var/lib/zookeeper] =>
   parameters =>
     "ensure": "directory",
     "group": "zookeeper",
     "mode": "0644",
     "owner": "zookeeper",
     "recurse": false
*******************************************
+ File[/var/log/kafka] =>
   parameters =>
     "ensure": "directory",
     "group": "kafka",
     "owner": "kafka"
*******************************************
+ File[/var/log/zookeeper] =>
   parameters =>
     "ensure": "directory",
     "group": "zookeeper",
     "mode": "0644",
     "notify": [
       "Service[zookeeper]"
     ],
     "owner": "zookeeper",
     "recurse": false
*******************************************
+ File[/var/tmp/kafka] =>
   parameters =>
     "ensure": "directory",
     "group": "kafka",
     "owner": "kafka"
*******************************************
+ File_line[java-home-environment] =>
   parameters =>
     "line": "JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64/",
     "match": "JAVA_HOME=",
     "path": "/etc/environment"
*******************************************
+ Group[kafka] =>
   parameters =>
     "ensure": "present",
     "system": false
*******************************************
+ Group[zookeeper] =>
   parameters =>
     "ensure": "present",
     "system": false
*******************************************
+ Java_ks[kafka:broker] =>
   parameters =>
     "certificate": "/etc/ssl/certs/letsencrypt/storage1.internal.staging.swh.net...
     "ensure": "latest",
     "name": "storage1.internal.staging.swh.network",
     "notify": [
       "Exec[kafka-reload-tls:EXTERNAL]",
       "Exec[kafka-reload-tls:INTERNAL]"
     ],
     "password": "LBXrf6suU4cedMtM",
     "private_key": "/etc/ssl/certs/letsencrypt/storage1.internal.staging.swh.net...
     "target": "/opt/kafka/config/broker.ks",
     "trustcacerts": true
*******************************************
+ Package[java-common] =>
   parameters =>
     "ensure": "present"
*******************************************
+ Package[java] =>
   parameters =>
     "ensure": "present",
     "name": "openjdk-11-jre-headless"
*******************************************
+ Package[zookeeper] =>
   parameters =>
     "ensure": "present"
*******************************************
+ Package[zookeeperd] =>
   parameters =>
     "ensure": "present"
*******************************************
+ Profile::Cron::D[kafka-purge-logs] =>
   parameters =>
     "command": "find /var/log/kafka -type f -name *.gz -a -ctime +60 -exec rm {}...
     "hour": 2,
     "minute": "fqdn_rand",
     "target": "kafka",
     "unique_tag": "kafka-purge-logs",
     "user": "root"
*******************************************
+ Profile::Cron::D[kafka-zip-logs] =>
   parameters =>
     "command": "find /var/log/kafka -type f -name *.log.* -a -not -name *.gz -a ...
     "hour": 3,
     "minute": "fqdn_rand",
     "target": "kafka",
     "unique_tag": "kafka-zip-logs",
     "user": "root"
*******************************************
+ Profile::Cron::File[kafka] =>
   parameters =>
     "target": "kafka"
*******************************************
+ Profile::Letsencrypt::Certificate[storage1.internal.staging.swh.network] =>
   parameters =>
     "basename": "storage1.internal.staging.swh.network",
     "privkey_group": "root",
     "privkey_mode": "0600",
     "privkey_owner": "root"
*******************************************
+ Profile::Prometheus::Export_scrape_config[kafka] =>
   parameters =>
     "job": "kafka",
     "labels": {
       "cluster": "rocquencourt_staging"
     },
     "target": "192.168.130.41:7071"
*******************************************
+ Service[kafka] =>
   parameters =>
     "enable": true,
     "ensure": "running",
     "hasrestart": true,
     "hasstatus": true
*******************************************
+ Service[zookeeper] =>
   parameters =>
     "enable": true,
     "ensure": "running",
     "hasrestart": true,
     "hasstatus": true
*******************************************
+ Systemd::Dropin_file[kafka/exitcode.conf] =>
   parameters =>
     "content": "[Service]\nSuccessExitStatus=143\n",
     "daemon_reload": "lazy",
     "ensure": "present",
     "filename": "exitcode.conf",
     "group": "root",
     "mode": "0444",
     "owner": "root",
     "path": "/etc/systemd/system",
     "selinux_ignore_defaults": false,
     "show_diff": true,
     "unit": "kafka.service"
*******************************************
+ Systemd::Dropin_file[kafka/restart.conf] =>
   parameters =>
     "content": "[Service]\nRestart=on-failure\nRestartSec=5\n",
     "daemon_reload": "lazy",
     "ensure": "present",
     "filename": "restart.conf",
     "group": "root",
     "mode": "0444",
     "owner": "root",
     "path": "/etc/systemd/system",
     "selinux_ignore_defaults": false,
     "show_diff": true,
     "unit": "kafka.service"
*******************************************
+ Systemd::Dropin_file[kafka/stop-timeout.conf] =>
   parameters =>
     "content": "[Service]\nTimeoutStopSec=infinity\n",
     "daemon_reload": "lazy",
     "ensure": "present",
     "filename": "stop-timeout.conf",
     "group": "root",
     "mode": "0444",
     "owner": "root",
     "path": "/etc/systemd/system",
     "selinux_ignore_defaults": false,
     "show_diff": true,
     "unit": "kafka.service"
*******************************************
+ User[kafka] =>
   parameters =>
     "ensure": "present",
     "shell": "/bin/bash",
     "system": false
*******************************************
+ User[zookeeper] =>
   parameters =>
     "comment": "Zookeeper",
     "ensure": "present",
     "gid": "zookeeper",
     "home": "/var/lib/zookeeper",
     "shell": "/bin/false",
     "system": false
*******************************************
*** End octocatalog-diff on storage1.internal.staging.swh.network

Diff Detail

Repository
rSPSITE puppet-swh-site
Branch
staging
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 24437
Build 38136: arc lint + arc unit

Event Timeline

vsellier created this revision.
olasd added a subscriber: olasd.

Looks good, except for a missing new TLS certificate, I think.

data/deployments/staging/common.yaml
115

You will need to create a letsencrypt certificate for storage1.internal.staging.swh.network, with a couple subjectaltnames (journal1.internal.staging.swh.network and broker1.journal.staging.swh.network)

This revision is now accepted and ready to land.Oct 14 2021, 4:13 PM

Ah, now that I read through this again; would it make sense for the zookeeper server to be called using the CNAME instead of the host FQDN ?

Ah, now that I read through this again; would it make sense for the zookeeper server to be called using the CNAME instead of the host FQDN ?

Yes sure it should be better to hide the storage role of the server, let's try to use the CNAME first.
I ask myself which one to use and forgot about it when it started to work locally

data/deployments/staging/common.yaml
115

Thanks, I forgot this even if I have created one to test locally :| rSENV5650b21f5abc0de430d77573e2e6b2cc8593ca3d

  • use journal1.i.s.s.n as zookeeper name (only this because the other kafka configuration is mainly using the fqdn of the server)
  • declare a new le certificate for storage1.i.s.s.n
  • exclude the kafka log directory from the backup