Page MenuHomeSoftware Heritage

gitlab migration reset state routine is flaky
Closed, MigratedEdits Locked

Description

After resetting the state "as usual":

kubectl delete gitlab gitlab
kubectl get pvc | grep 'gitlab' | awk '{print $1}' | xargs kubectl delete pvc
kubectl get pv | grep 'gitlab' | awk '{print $1}' | xargs kubectl delete pv

It started deploying as usual and now, gitlab controller manager is looping over:

$ kubectl logs -f gitlab-controller-manager-788d4dbb78-45shw manager
...
2022-06-22T07:44:20.774Z        INFO    controllers.GitLab      Reconciling GitLab      {"gitlab": "gitlab-system/gitlab"}
2022-06-22T07:44:20.775Z        DEBUG   controllers.GitLab      version information     {"gitlab": "gitlab-system/gitlab", "upgrade": false, "current version": "", "desired version": "6.0.0"}
2022-06-22T07:44:20.837Z        INFO    controllers.GitLab      self-signed certificates job skipped, not needed per configuration      {"gitlab": "gitlab.gitlab-system"}
2022-06-22T07:44:20.853Z        DEBUG   controllers.GitLab      creating ingress        {"gitlab": "gitlab.gitlab-system", "namespace": "gitlab-system", "ingress": "gitlab-minio"}
2022-06-22T07:44:20.915Z        ERROR   controller.gitlab       Reconciler error        {"reconciler group": "apps.gitlab.com", "reconciler kind": "GitLab", "name": "gitlab", "namespace": "gitlab-system", "error": "resourceVersion should not be set on objects to be created"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.1/pkg/internal/controller/controller.go:227

Pods deployed so far (missing some, e.g webservice...):

$ kubectl get pods -n gitlab-system
NAME                                                   READY   STATUS      RESTARTS   AGE
gitlab-controller-manager-788d4dbb78-45shw             2/2     Running     1          6d23h
gitlab-gitaly-0                                        1/1     Running     0          10m
gitlab-minio-67d958644f-cz46c                          1/1     Running     0          10m
gitlab-minio-create-buckets-1-vwb7m                    0/1     Completed   0          10m
gitlab-nginx-ingress-controller-795897994-4kj8s        1/1     Running     0          11m
gitlab-nginx-ingress-controller-795897994-z8kbf        1/1     Running     0          11m
gitlab-nginx-ingress-defaultbackend-6c86ccf4c4-vl4bp   1/1     Running     0          11m
gitlab-postgresql-0                                    2/2     Running     0          10m
gitlab-redis-master-0                                  2/2     Running     0          10m
gitlab-shared-secrets-1-qf9-4j79f                      0/1     Completed   0          11m

cert-manager says:

$ kubectl -n cert-manager logs -f cert-manager-b4d6fd99b-gxrkc
...
E0622 07:38:50.490878       1 controller.go:166] cert-manager/certificates-readiness "msg"="re-queuing item due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"gitlab-minio-tls\": StorageError: invalid object, Code: 4, Key: /registry/cert-manager.io/certificates/gitlab-system/gitlab-minio-tls, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 600e9f1c-d477-4b51-a612-772e7ae322f6, UID in object meta: " "key"="gitlab-system/gitlab-minio-tls"
I0622 07:38:51.254249       1 setup.go:309] cert-manager/issuers "msg"="verified existing registration with ACME server" "related_resource_kind"="Secret" "related_resource_name"="gitlab-acme-key" "related_resource_namespace"="gitlab-system" "resource_kind"="Issuer" "resource_name"="gitlab-issuer" "resource_namespace"="gitlab-system" "resource_version"="v1"
I0622 07:38:51.254489       1 conditions.go:95] Setting lastTransitionTime for Issuer "gitlab-issuer" condition "Ready" to 2022-06-22 07:38:51.254296328 +0000 UTC m=+602713.793195880
I0622 07:38:51.295977       1 setup.go:202] cert-manager/issuers "msg"="skipping re-verifying ACME account as cached registration details look sufficient" "related_resource_kind"="Secret" "related_resource_name"="gitlab-acme-key" "related_resource_namespace"="gitlab-system" "resource_kind"="Issuer" "resource_name"="gitlab-issuer" "resource_namespace"="gitlab-system" "resource_version"="v1"

Event Timeline

ardumont created this task.

Looks like something is wrong in the operator state management.
For what I found on internet, it could be related to the cert-manager version but it should be already fixed. For example: https://gitlab.com/gitlab-org/cloud-native/gitlab-operator/-/issues/315
(The current cert-manager version in the cluster is 1.8.0)

I've just restarted the operator controller and the cluster has restarted correctly

kubectl rollout restart deployment/gitlab-controller-manager
ardumont changed the task status from Open to Work in Progress.Jun 22 2022, 12:52 PM
ardumont moved this task from Backlog to in-progress on the System administration board.
ardumont claimed this task.
ardumont moved this task from deployed/landed/monitoring to done on the System administration board.