Page MenuHomeSoftware Heritage

Clean up rancher clusters
Closed, MigratedEdits Locked

Description

Clusters to clean up:

  • P1434: deployment-internship
  • staging-workers
  • production-cassandra
  • poc-argoworklow
  • gitlab-rocq
  • P1435: cluster-graphql3
  • cluster-argo We keep it as a ci cluster for now, we'll see later if we create a dedicated ci cluster

For each cluster:

  • Destroy cluster (e.g. P1434)
  • Decomission associated nodes in puppet (e.g. T4456#89591)
  • Update inventory to update the node's status to "offline" [1]

Plan: https://hedgedoc.softwareheritage.org/v5oz5CQXRdqOv4x-xWJ7-Q?edit

[1] https://inventory.internal.admin.swh.network/virtualization/virtual-machines/

Event Timeline

ardumont changed the task status from Open to Work in Progress.Aug 23 2022, 5:06 PM
ardumont triaged this task as Normal priority.
ardumont created this task.
ardumont updated the task description. (Show Details)
ardumont updated the task description. (Show Details)
ardumont moved this task from Backlog to in-progress on the System administration board.
root@pergamon:~# swh-puppet-master-decommission rancher-node-intership0.internal.staging.swh.network rancher-node-intership1.internal.staging.swh.network rancher-node-intership2.internal.staging.swh.network
+ puppet node deactivate rancher-node-intership0.internal.staging.swh.network rancher-node-intership1.internal.staging.swh.network rancher-node-intership2.internal.staging.swh.network
Submitted 'deactivate node' for rancher-node-intership0.internal.staging.swh.network with UUID 6b8bd570-606d-4efd-8dac-82ef8696c5c1
Submitted 'deactivate node' for rancher-node-intership1.internal.staging.swh.network with UUID 2fd99ce6-14fa-4909-8d79-abf04d059258
Submitted 'deactivate node' for rancher-node-intership2.internal.staging.swh.network with UUID 4368bdb7-0faa-43bd-9067-88b965dee7d8
+ puppet node clean rancher-node-intership0.internal.staging.swh.network rancher-node-intership1.internal.staging.swh.network rancher-node-intership2.internal.staging.swh.network
Notice: Revoked certificate with serial 305
Notice: Removing file Puppet::SSL::Certificate rancher-node-intership0.internal.staging.swh.network at '/var/lib/puppet/ssl/ca/signed/rancher-node-intership0.internal.staging.swh.network.pem'
Notice: Revoked certificate with serial 306
Notice: Removing file Puppet::SSL::Certificate rancher-node-intership1.internal.staging.swh.network at '/var/lib/puppet/ssl/ca/signed/rancher-node-intership1.internal.staging.swh.network.pem'
Notice: Revoked certificate with serial 307
Notice: Removing file Puppet::SSL::Certificate rancher-node-intership2.internal.staging.swh.network at '/var/lib/puppet/ssl/ca/signed/rancher-node-intership2.internal.staging.swh.network.pem'
rancher-node-intership0.internal.staging.swh.network
rancher-node-intership1.internal.staging.swh.network
rancher-node-intership2.internal.staging.swh.network
+ puppet cert clean rancher-node-intership0.internal.staging.swh.network rancher-node-intership1.internal.staging.swh.network rancher-node-intership2.internal.staging.swh.network
Warning: `puppet cert` is deprecated and will be removed in a future release.
   (location: /usr/lib/ruby/vendor_ruby/puppet/application.rb:370:in `run')
Notice: Revoked certificate with serial 305
Notice: Revoked certificate with serial 306
Notice: Revoked certificate with serial 307
+ systemctl restart apache2
vsellier updated the task description. (Show Details)
vsellier added a subscriber: vsellier.

poc-argoworkflow was manually created so the cluster was manually removed:

  • from rancher
  • the 3 vms poc-argoworkflow-[1-3] was remove from uffizi
  • the provisionned ips were deleted in the the inventory

It's the same for the gitlab-rocq cluster:

  • gitlab-rocq deleted in the rancher UI
  • Vms poc-k8s-gitlab0[1-3] deleted in proxmox
  • ips deleted in the inventory

Production cassandra cluster cleanup: P1438

Apply complete! Resources: 0 added, 0 changed, 4 destroyed.

puppet decommissionning:

root@pergamon:/usr/local/sbin# swh-puppet-master-decommission rancher-node-cassandra1.internal.softwareheritage.org rancher-node-cassandra2.internal.softwareheritage.org rancher-node-cassandra3.internal.softwareheritage.org
+ puppet node deactivate rancher-node-cassandra1.internal.softwareheritage.org rancher-node-cassandra2.internal.softwareheritage.org rancher-node-cassandra3.internal.softwareheritage.org
Submitted 'deactivate node' for rancher-node-cassandra1.internal.softwareheritage.org with UUID 833fc63e-9bfa-4936-818b-422c37da9849
Submitted 'deactivate node' for rancher-node-cassandra2.internal.softwareheritage.org with UUID a77406f1-f151-4dbb-9aa9-14f4db597e4f
Submitted 'deactivate node' for rancher-node-cassandra3.internal.softwareheritage.org with UUID 7639aac7-b165-4161-8da2-1aa4cf1b0d45
+ puppet node clean rancher-node-cassandra1.internal.softwareheritage.org rancher-node-cassandra2.internal.softwareheritage.org rancher-node-cassandra3.internal.softwareheritage.org
Notice: Revoked certificate with serial 313
Notice: Removing file Puppet::SSL::Certificate rancher-node-cassandra1.internal.softwareheritage.org at '/var/lib/puppet/ssl/ca/signed/rancher-node-cassandra1.internal.softwareheritage.org.pem'
Notice: Revoked certificate with serial 311
Notice: Removing file Puppet::SSL::Certificate rancher-node-cassandra2.internal.softwareheritage.org at '/var/lib/puppet/ssl/ca/signed/rancher-node-cassandra2.internal.softwareheritage.org.pem'
Notice: Revoked certificate with serial 312
Notice: Removing file Puppet::SSL::Certificate rancher-node-cassandra3.internal.softwareheritage.org at '/var/lib/puppet/ssl/ca/signed/rancher-node-cassandra3.internal.softwareheritage.org.pem'
rancher-node-cassandra1.internal.softwareheritage.org
rancher-node-cassandra2.internal.softwareheritage.org
rancher-node-cassandra3.internal.softwareheritage.org
+ puppet cert clean rancher-node-cassandra1.internal.softwareheritage.org rancher-node-cassandra2.internal.softwareheritage.org rancher-node-cassandra3.internal.softwareheritage.org
Warning: `puppet cert` is deprecated and will be removed in a future release.
   (location: /usr/lib/ruby/vendor_ruby/puppet/application.rb:370:in `run')
Notice: Revoked certificate with serial 313
Notice: Revoked certificate with serial 311
Notice: Revoked certificate with serial 312
+ systemctl restart apache2
vsellier claimed this task.
vsellier moved this task from in-progress to done on the System administration board.

The staging cluster was reorganized to have one dedicated management node with only etcd and control plane deployed. The other nodes are only workers.
We will have to check if we want to add more rpc nodes for service resiliency (and more etcd node also to have a resilient cluster)