Page MenuHomeSoftware Heritage
Feed Advanced Search

Aug 8 2022

vsellier added inline comments to D8209: docker: update the list of created topics.
Aug 8 2022, 1:34 PM
vsellier accepted D8210: docker: use named volumes for kafka, storage and objstorage.
Aug 8 2022, 1:27 PM
vsellier accepted D8211: docker: add iputils-ping package in the container image.
Aug 8 2022, 1:22 PM

Jul 13 2022

vsellier committed rSKCONFff5707a7c44b: Bootstrap manual cassandra configuration (authored by vsellier).
Bootstrap manual cassandra configuration
Jul 13 2022, 8:01 PM
vsellier committed rSKCONF1aeb6d54ed98: non working k8ssandra configuration (authored by vsellier).
non working k8ssandra configuration
Jul 13 2022, 8:01 PM
vsellier committed rSKCONF8a0f4d990de1: bootstrap the cassandra cluster (authored by vsellier).
bootstrap the cassandra cluster
Jul 13 2022, 8:01 PM
vsellier committed rSPRE944ad7507847: cassandra: update the cluster configuration to deploy cassadra manually (authored by vsellier).
cassandra: update the cluster configuration to deploy cassadra manually
Jul 13 2022, 11:12 AM
vsellier closed D8116: Deploy the cassandra operator on the production cassandra cluster.
Jul 13 2022, 11:12 AM
vsellier committed rSPRE16feb3dda32c: Deploy the cassandra operator on the production cassandra cluster (authored by vsellier).
Deploy the cassandra operator on the production cassandra cluster
Jul 13 2022, 11:12 AM
vsellier added a comment to T4373: [cassandra] Test the new hardware.

Unfortunately, the operator test is a failure due to the lack of configuration possibility

  • non blocker, the init containers are OOMkilled during the start, it can be solved by editing the cassandra statefulset created by the operator to extend the limits
  • blocker, it's not possible to configure the commitlog_directory explicitly. it's by default on /var/lib/cassandra/commitlog
    • it's not easy to propagate the host mounts to use 2 mountpoints /srv/cassandra and /srv/cassandra/commitlog without tweaking the kernel / rancher configuration
    • it's not possible to add a second volume on the pod description created by the operator
Jul 13 2022, 10:25 AM · Storage manager, System administration

Jul 12 2022

vsellier renamed T4391: [cassandra] deploy the replaying stack from [cassandra] deploy the replying stack to [cassandra] deploy the replaying stack.
Jul 12 2022, 3:29 PM · System administration
vsellier updated the diff for D8116: Deploy the cassandra operator on the production cassandra cluster.

declare the extra volume binding (no impact on the tfstate)

Jul 12 2022, 3:18 PM
vsellier added a revision to T4373: [cassandra] Test the new hardware: D8116: Deploy the cassandra operator on the production cassandra cluster.
Jul 12 2022, 3:01 PM · Storage manager, System administration
vsellier requested review of D8116: Deploy the cassandra operator on the production cassandra cluster.
Jul 12 2022, 3:01 PM
vsellier committed rSPREd081816a9ad3: cassandra - match node roles (authored by vsellier).
cassandra - match node roles
Jul 12 2022, 2:44 PM
vsellier committed rSPRE6381c215ea0c: cassandra - match kubernes version (authored by vsellier).
cassandra - match kubernes version
Jul 12 2022, 2:38 PM
vsellier committed rSKCONFf89d3a7a6bc4: Add the argo workflow poc configuration (authored by vsellier).
Add the argo workflow poc configuration
Jul 12 2022, 12:38 PM
vsellier triaged T4391: [cassandra] deploy the replaying stack as Normal priority.
Jul 12 2022, 12:13 PM · System administration
vsellier changed the status of T4374: [cassandra] Test basic topology, a subtask of T4379: [cassandra] create etcd / controlplane servers, from Open to Work in Progress.
Jul 12 2022, 12:10 PM · Storage manager, System administration
vsellier changed the status of T4374: [cassandra] Test basic topology from Open to Work in Progress.
Jul 12 2022, 12:10 PM · Storage manager, System administration
vsellier closed T4389: [cassandra] Configure the monitoring of the cluster as Resolved.

The mountpoint needs to be declare on the kubelet container to be reachable by the pods:

--- /tmp/cluster-orig.yaml	2022-07-12 11:27:27.169509573 +0200
+++ /tmp/cluster.yaml	2022-07-12 11:26:54.865395186 +0200
@@ -58,6 +58,8 @@
       service_node_port_range: 30000-32767
     kube-controller: {}
     kubelet:
+      extra_binds:
+        - '/srv/prometheus:/srv/prometheus'
       fail_swap_on: false
       generate_serving_certificate: false
     kubeproxy: {}
Jul 12 2022, 12:09 PM · System administration
vsellier closed T4389: [cassandra] Configure the monitoring of the cluster, a subtask of T4373: [cassandra] Test the new hardware, as Resolved.
Jul 12 2022, 12:09 PM · Storage manager, System administration
vsellier committed rSKCONF18f544855355: Bootstrap cassandra cluster configuration (authored by vsellier).
Bootstrap cassandra cluster configuration
Jul 12 2022, 11:56 AM
vsellier committed rSKCONF2d4826b67ca5: Specify the repository organization (authored by vsellier).
Specify the repository organization
Jul 12 2022, 11:47 AM
vsellier committed R261:8249ffa761b1: Add the kubernetes clusters configuration repository (authored by vsellier).
Add the kubernetes clusters configuration repository
Jul 12 2022, 11:43 AM

Jul 11 2022

vsellier closed D8094: Declare the kubernetes cluster and management nodes for cassandra.
Jul 11 2022, 10:02 PM
vsellier committed rSPREbd8e7bc4223d: Declare the kubernetes cluster and management nodes for cassandra (authored by vsellier).
Declare the kubernetes cluster and management nodes for cassandra
Jul 11 2022, 10:02 PM
vsellier updated the diff for D8094: Declare the kubernetes cluster and management nodes for cassandra.

add tfstate file

Jul 11 2022, 10:02 PM
vsellier added a comment to T4389: [cassandra] Configure the monitoring of the cluster.

Configure the data directory:

root@pergamon:~# clush -b -w @cassandra-mgmt hostname
---------------
rancher-node-cassandra1
---------------
rancher-node-cassandra1
---------------
rancher-node-cassandra2
---------------
rancher-node-cassandra2
---------------
rancher-node-cassandra3
---------------
rancher-node-cassandra3
Jul 11 2022, 5:01 PM · System administration
vsellier renamed T4389: [cassandra] Configure the monitoring of the cluster from Configure the monitoring of the cluster to [cassandra] Configure the monitoring of the cluster.
Jul 11 2022, 4:49 PM · System administration
vsellier moved T4389: [cassandra] Configure the monitoring of the cluster from Backlog to in-progress on the System administration board.
Jul 11 2022, 4:49 PM · System administration
vsellier changed the status of T4389: [cassandra] Configure the monitoring of the cluster from Open to Work in Progress.
Jul 11 2022, 4:48 PM · System administration
vsellier closed T4379: [cassandra] create etcd / controlplane servers as Resolved.

Finally, the cluster is up.
I'm not sure what unstuck the node registration, but I suspect a node with all the roles is needed to bootstrap the cluster.
I tried this initially, it didn't worked, but I'm not sure in which status the cluster was.

Jul 11 2022, 4:33 PM · Storage manager, System administration
vsellier closed T4379: [cassandra] create etcd / controlplane servers, a subtask of T4373: [cassandra] Test the new hardware, as Resolved.
Jul 11 2022, 4:33 PM · Storage manager, System administration
vsellier committed rSPSITE1ae1e5a7cf2f: Force to use pergamon as dns (authored by vsellier).
Force to use pergamon as dns
Jul 11 2022, 2:29 PM
vsellier committed rSENV85ef675e7a8e: update facts (authored by vsellier).
update facts
Jul 11 2022, 2:27 PM
vsellier committed rSENV333381e88e93: Declare the cassandra nodes (authored by vsellier).
Declare the cassandra nodes
Jul 11 2022, 2:27 PM
vsellier committed rSPSITE02794d0df963: Install zfs and docker on the cassandra node to prepare the cass operator tests (authored by vsellier).
Install zfs and docker on the cassandra node to prepare the cass operator tests
Jul 11 2022, 2:19 PM
vsellier closed D8105: Install zfs and docker on the cassandra node to prepare the cass operator tests.
Jul 11 2022, 2:19 PM
vsellier added inline comments to D8105: Install zfs and docker on the cassandra node to prepare the cass operator tests.
Jul 11 2022, 2:18 PM
vsellier updated the diff for D8105: Install zfs and docker on the cassandra node to prepare the cass operator tests.

update dns configuration to use pergamon directly

Jul 11 2022, 2:13 PM
vsellier committed R260:ae5601358904: fake release (authored by vsellier).
fake release
Jul 11 2022, 12:08 PM
vsellier committed R260:c9a8881ffd16: bootstrap environment's values (authored by vsellier).
bootstrap environment's values
Jul 11 2022, 12:08 PM
vsellier updated the task description for T4387: Scrubber processes getting killed by OOM killer.
Jul 11 2022, 9:47 AM · System administration, Datastore Scrubber
vsellier updated the task description for T4387: Scrubber processes getting killed by OOM killer.
Jul 11 2022, 9:46 AM · System administration, Datastore Scrubber
vsellier created T4387: Scrubber processes getting killed by OOM killer.
Jul 11 2022, 9:43 AM · System administration, Datastore Scrubber
vsellier requested review of D8105: Install zfs and docker on the cassandra node to prepare the cass operator tests.
Jul 11 2022, 9:33 AM
vsellier added a revision to T4373: [cassandra] Test the new hardware: D8105: Install zfs and docker on the cassandra node to prepare the cass operator tests.
Jul 11 2022, 9:33 AM · Storage manager, System administration

Jul 7 2022

vsellier added a comment to T4379: [cassandra] create etcd / controlplane servers.

The management nodes were correctly created but it seems rancher is having some issuer to register them in the cluster.

Jul 7 2022, 6:52 PM · Storage manager, System administration
vsellier closed T4359: Update rancher cluster to kubernetes 1.22 as Resolved.

The kubernetes upgrade was launched through the azure portal (it's also possible to trigger it with the az command line)
Everything looks fine:

  • A new node with the version 1.22.6 was triggerd
kubectl get pods -o wide; echo; kubectl get nodes -o wide
NAME                               READY   STATUS    RESTARTS      AGE   IP            NODE                              NOMINATED NODE   READINESS GATES
debian                             1/1     Running   1 (23m ago)   27m   10.244.0.63   aks-default-36212332-vmss000000   <none>           <none>
rancher-59f4c74c6f-5vlq6           1/1     Running   0             91m   10.244.0.59   aks-default-36212332-vmss000000   <none>           <none>
rancher-59f4c74c6f-92txx           1/1     Running   0             90m   10.244.0.60   aks-default-36212332-vmss000000   <none>           <none>
rancher-59f4c74c6f-cfshs           1/1     Running   0             91m   10.244.0.58   aks-default-36212332-vmss000000   <none>           <none>
rancher-webhook-6958cfcddf-2gjwn   1/1     Running   0             85d   10.244.0.26   aks-default-36212332-vmss000000   <none>           <none>
Jul 7 2022, 6:37 PM · System administration
vsellier closed T4359: Update rancher cluster to kubernetes 1.22, a subtask of T4358: Upgrade AKS versions, as Resolved.
Jul 7 2022, 6:37 PM · System administration
vsellier changed the status of T4359: Update rancher cluster to kubernetes 1.22 from Open to Work in Progress.
Jul 7 2022, 6:22 PM · System administration
vsellier changed the status of T4359: Update rancher cluster to kubernetes 1.22, a subtask of T4358: Upgrade AKS versions, from Open to Work in Progress.
Jul 7 2022, 6:22 PM · System administration
vsellier updated the diff for D8094: Declare the kubernetes cluster and management nodes for cassandra.

rebase

Jul 7 2022, 2:56 PM
vsellier accepted D8089: Provision thanos query node.
Jul 7 2022, 2:51 PM
vsellier requested review of D8094: Declare the kubernetes cluster and management nodes for cassandra.
Jul 7 2022, 12:11 PM
vsellier added a revision to T4379: [cassandra] create etcd / controlplane servers: D8094: Declare the kubernetes cluster and management nodes for cassandra.
Jul 7 2022, 12:11 PM · Storage manager, System administration
vsellier changed the status of T4379: [cassandra] create etcd / controlplane servers, a subtask of T4373: [cassandra] Test the new hardware, from Open to Work in Progress.
Jul 7 2022, 11:56 AM · Storage manager, System administration
vsellier changed the status of T4379: [cassandra] create etcd / controlplane servers from Open to Work in Progress.
Jul 7 2022, 11:56 AM · Storage manager, System administration
vsellier added a comment to D8089: Provision thanos query node.

I've no idea if the cpu/memory/disk spec are large enough or not, I didn't find the info on the thanos documentation

Jul 7 2022, 11:52 AM
vsellier requested changes to D8089: Provision thanos query node.
Jul 7 2022, 11:52 AM

Jul 5 2022

vsellier removed a parent task for T4374: [cassandra] Test basic topology: T4373: [cassandra] Test the new hardware.
Jul 5 2022, 5:50 PM · Storage manager, System administration
vsellier removed a subtask for T4373: [cassandra] Test the new hardware: T4374: [cassandra] Test basic topology.
Jul 5 2022, 5:50 PM · Storage manager, System administration
vsellier removed a subtask for T4373: [cassandra] Test the new hardware: T4375: [cassandra] One cassandra per data disk.
Jul 5 2022, 5:50 PM · Storage manager, System administration
vsellier removed a parent task for T4375: [cassandra] One cassandra per data disk: T4373: [cassandra] Test the new hardware.
Jul 5 2022, 5:50 PM · Storage manager, System administration
vsellier added a parent task for T4374: [cassandra] Test basic topology: T4379: [cassandra] create etcd / controlplane servers.
Jul 5 2022, 5:49 PM · Storage manager, System administration
vsellier added a subtask for T4379: [cassandra] create etcd / controlplane servers: T4374: [cassandra] Test basic topology.
Jul 5 2022, 5:49 PM · Storage manager, System administration
vsellier added a parent task for T4375: [cassandra] One cassandra per data disk: T4379: [cassandra] create etcd / controlplane servers.
Jul 5 2022, 5:49 PM · Storage manager, System administration
vsellier added a subtask for T4379: [cassandra] create etcd / controlplane servers: T4375: [cassandra] One cassandra per data disk.
Jul 5 2022, 5:49 PM · Storage manager, System administration
vsellier triaged T4379: [cassandra] create etcd / controlplane servers as Normal priority.
Jul 5 2022, 5:47 PM · Storage manager, System administration
vsellier changed the status of T4373: [cassandra] Test the new hardware from Open to Work in Progress.
Jul 5 2022, 5:41 PM · Storage manager, System administration
vsellier renamed T4359: Update rancher cluster to kubernetes 1.22 from Update AKS cluster to kubernetes 1.22 to Update rancher cluster to kubernetes 1.22.
Jul 5 2022, 5:38 PM · System administration
vsellier accepted D8064: swh-graph: rename services (now production-ready, no longer dev).

Please also merge this in the staging branch and notify the sysadm irc room when it's pushed, we will need to deploy it manually to clean the previous services

Jul 5 2022, 3:55 PM
vsellier triaged T4375: [cassandra] One cassandra per data disk as Normal priority.
Jul 5 2022, 9:52 AM · Storage manager, System administration
vsellier triaged T4374: [cassandra] Test basic topology as Normal priority.
Jul 5 2022, 9:43 AM · Storage manager, System administration
vsellier triaged T4373: [cassandra] Test the new hardware as Normal priority.
Jul 5 2022, 9:36 AM · Storage manager, System administration
vsellier requested changes to D8064: swh-graph: rename services (now production-ready, no longer dev).
Jul 5 2022, 8:49 AM

Jun 30 2022

vsellier committed rCJSWHb0e07c673ec0: wip - add a forge to host the local Changes (authored by vsellier).
wip - add a forge to host the local Changes
Jun 30 2022, 11:25 PM
vsellier committed rCJSWH3d562f112c91: wip - poc the swh-apps pipeline (authored by vsellier).
wip - poc the swh-apps pipeline
Jun 30 2022, 5:13 PM
vsellier committed rCJSWH278f12b744b1: wip - poc the swh-apps pipeline (authored by vsellier).
wip - poc the swh-apps pipeline
Jun 30 2022, 5:10 PM
vsellier closed D8062: fix a typo on the production objstorage vhost.
Jun 30 2022, 4:48 PM
vsellier committed rSPSITE57fb33253ca7: fix a typo on the production objstorage vhost (authored by vsellier).
fix a typo on the production objstorage vhost
Jun 30 2022, 4:47 PM
vsellier requested review of D8062: fix a typo on the production objstorage vhost.
Jun 30 2022, 4:40 PM
vsellier closed D8057: Add a docker environment to test the job-builder inside jenkins.
Jun 30 2022, 4:09 PM
vsellier committed rCJSWH21e47db56cb5: Add a docker environment to test the job-builder inside jenkins (authored by vsellier).
Add a docker environment to test the job-builder inside jenkins
Jun 30 2022, 4:09 PM
vsellier updated the diff for D8057: Add a docker environment to test the job-builder inside jenkins.

rebase

Jun 30 2022, 4:09 PM
vsellier updated the diff for D8057: Add a docker environment to test the job-builder inside jenkins.

fix the readme name

Jun 30 2022, 4:08 PM
vsellier added inline comments to D8057: Add a docker environment to test the job-builder inside jenkins.
Jun 30 2022, 4:07 PM
vsellier requested review of D8057: Add a docker environment to test the job-builder inside jenkins.
Jun 30 2022, 12:48 PM

Jun 29 2022

vsellier triaged T4360: Update gitlab kubernetes cluster to 1.22 as Normal priority.
Jun 29 2022, 9:55 AM · System administration
vsellier triaged T4359: Update rancher cluster to kubernetes 1.22 as Normal priority.
Jun 29 2022, 9:55 AM · System administration
vsellier added a comment to T4358: Upgrade AKS versions.

It seems the rancher cluster can be updated to any version :
from https://rancher.com/docs/rancher/v2.6/en/installation/install-rancher-on-k8s/:

Rancher can be installed on any Kubernetes cluster. This cluster can use upstream Kubernetes, or it can use one of Rancher’s Kubernetes distributions, or it can be a managed Kubernetes cluster from a provider such as Amazon EKS.

It's also confirmed by the suse rke compatibility matrix: https://www.suse.com/assets/EN-Rancherv2.6.4-150422-0151-56.pdf

Jun 29 2022, 9:54 AM · System administration
vsellier moved T4358: Upgrade AKS versions from Backlog to in-progress on the System administration board.
Jun 29 2022, 9:45 AM · System administration
vsellier changed the status of T4358: Upgrade AKS versions from Open to Work in Progress.
Jun 29 2022, 9:45 AM · System administration

Jun 28 2022

vsellier closed T4340: swh-graph timeouts as Wontfix.

I will be solved by D7890

Jun 28 2022, 6:49 PM · Compressed graph service
vsellier closed T4313: [provenance] some process are oom killed as Resolved.
Jun 28 2022, 6:48 PM · System administration, Provenance database
vsellier committed R259:c84dcaac46d9: swh-provenance-client: update requirements-frozen.txt (authored by vsellier).
swh-provenance-client: update requirements-frozen.txt
Jun 28 2022, 6:04 PM
vsellier committed R259:920615f59e89: swh-provenance-client: update requirements-frozen.txt (authored by vsellier).
swh-provenance-client: update requirements-frozen.txt
Jun 28 2022, 5:36 PM
vsellier closed D8040: Limit the number of entries in the cache.
Jun 28 2022, 10:17 AM
vsellier committed rDPROVf5f741366383: Limit the number of entries in the cache (authored by vsellier).
Limit the number of entries in the cache
Jun 28 2022, 10:17 AM