Bootstrap a gitlab instance hosted on our infrastructure
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	vsellier
	Mar 16 2022, 11:19 AM

Description

In order to prepare the gitlab migration, we should look at the different way to install and manage a gitlab instance on our own.

This page lists different installation methods:
https://docs.gitlab.com/ee/install/

Revisions and Commits

rSPRE sysadm-provisioning
	D7551	rSPRE5292d8c407a5 rancher: Create an aks cluster dedicated to rancher
	D7551	rSPRE0241d0e4514c aks: Allow to disable the public ip provisioning
	D7544	rSPRE2630acf0415a gitlab: create a storage container per application
	D7539	rSPREb6f1b781e59b gitlab: Create a storage account to store the assets
	D7488	rSPREc7097d2af599 aks: ensure there is a static ip assigned to the cluster
	D7419	rSPRE3b97e0d571cc azure: allow to configure a kubernetes cluster for a gitlab hosting
rSPSITE puppet-swh-site
	D7562	rSPSITE435306667318 Declare the rancher manager in the dns entries

Related Objects

Mentioned In: T4063: Deploy gitlab instance for production
rSPRE7edb55db2564: gitlab: Activate soft deletion of storage container
Mentioned Here: T4144: Elastic worker infrastructure

Event Timeline

vsellier changed the task status from Open to Work in Progress.Mar 16 2022, 11:19 AM

vsellier triaged this task as Normal priority.

vsellier created this task.

The following installation methods were tested:

debian packages
docker image
helm charts
gitlab operator

1/ and 2/ are working well but are installing a lot of re-packaged software like nginx, postgresql, redis, prometheus, grafana.
It will force us to implement some login in our puppet code to manage the configuration as they are managed in the gitlab way.
Zero-downtime upgrades are not possible with a single-node deployment and must manually managed if a multi-node deployment is configured

3/ The deployed component are finely tunable and only the activated components will be deployed in the cluster.
The zero-downtime upgrade is not yet implemented with this kind of deployment [1]
The upgrades are completely managed through the helm chart and are resumed to a one liner upgrade

4/ The gitlab operator is not yet production ready but is under active development. The viable version[2] is scheduled for may so should be available before our final migration.
The gitlab operator is using the helm chart but manage all the life cycle of the deployed components.
The biggest advantage is the upgrade are fully managed by the operator and are performed with zero downtime.

According to the tests with an AKS cluster, a minimal configuration with enough dynamic redundancy to allow the upgrades should cost around ~300$/m without the storage (probably ~50$/m)

The next step is to test a deployment though terraform, and what kind of monitoring we could implement.

[1] https://docs.gitlab.com/charts/installation/upgrade.html
[2] https://about.gitlab.com/direction/maturity/

vsellier added a revision: D7419: azure: allow to configure a kubernetes cluster for a gitlab hosting.Mar 23 2022, 7:14 PM

vsellier added a commit: rSPRE3b97e0d571cc: azure: allow to configure a kubernetes cluster for a gitlab hosting.Mar 24 2022, 4:30 PM

The test instance can be reached at https://gitlab-staging.swh.network

This is a draft for the manual installation: https://hedgedoc.softwareheritage.org/ynL9z-6JRnGYhW8vrLDl-w#

Several points need to be fixed like:

The ability to send a mail through pergamon like the other servers
A persistent ip address not changing after each ingress controller redeployment
a decent monitoring

vsellier added a revision: D7488: aks: ensure there is a static ip assigned to the cluster.Apr 1 2022, 6:10 PM

vsellier added a commit: rSPREc7097d2af599: aks: ensure there is a static ip assigned to the cluster.Apr 1 2022, 6:19 PM

Status update:

The global installation process is defined
- The static ip is configured during the terraform deployment
The ability to monitor the cluster and the gitlab is verified
- Prometheus url with the kubernetes stats (for federation) : http://192.168.200.15:9090
- Temporary grafana to have an idea of the possible dashboards: http://192.168.200.15:3000
- Gitlab exported stats (exporter to add on the pergamon's prometheus):
  - Readiness probe: https://gitlab-staging.swh.network/-/readiness?token=<token>
  - Liveness probe: https://gitlab-staging.swh.network/-/liveness?token=<token>
  - Prometheus exporter: https://gitlab-staging.swh.network/-/metrics?token=<token>

The token can be found on the dedicated gitlab page: https://gitlab-staging.swh.network/admin/health_check

outbound emails: Solved by creating a gandi's inbox and configuring the deployment to use it

All the configuration / deployment steps are listed on the hedgedoc document: https://hedgedoc.softwareheritage.org/ynL9z-6JRnGYhW8vrLDl-w#

The next step is to test the backups and the restoration on our infrastructure

ardumont moved this task from Backlog to in-progress on the System administration board.Apr 7 2022, 2:48 PM

vsellier added a revision: D7539: gitlab: Create a storage account to store the assets.Apr 8 2022, 3:02 PM

vsellier added a commit: rSPREb6f1b781e59b: gitlab: Create a storage account to store the assets.Apr 8 2022, 3:32 PM

vsellier added a revision: D7544: gitlab: create a storage container per application.Apr 8 2022, 7:01 PM

vsellier added a commit: rSPRE2630acf0415a: gitlab: create a storage container per application.Apr 11 2022, 9:35 AM

vsellier mentioned this in rSPRE7edb55db2564: gitlab: Activate soft deletion of storage container.

vsellier added a revision: D7551: Prepare the rancher manager bootstrap.Apr 12 2022, 9:57 AM

vsellier added a revision: D7562: Declare the rancher manager in the dns entries.Apr 13 2022, 12:30 PM

vsellier added a commit: rSPSITE435306667318: Declare the rancher manager in the dns entries.Apr 13 2022, 3:08 PM

vsellier added a commit: rSPRE0241d0e4514c: aks: Allow to disable the public ip provisioning.Apr 14 2022, 6:57 PM

vsellier added a commit: rSPRE5292d8c407a5: rancher: Create an aks cluster dedicated to rancher.

A restoration of the azure instance on our infra was successfully performed [1].
Everything is well imported: users, repositories, issues, ...
The usage of a quick and dirty longhorn storage seems to make the instance slower than azure but the performance was not the goal of this POC.

I used this test to initialize a rancher instance allowing to manage our future internal kubernetes clusters [2].
T4144 will be used to test if it will be possible to manage the clusters and their nodes with terraform.

[1] https://hedgedoc.softwareheritage.org/ynL9z-6JRnGYhW8vrLDl-w?both#Migration-to-another-kubernetes-cluster
[2] https://hedgedoc.softwareheritage.org/ynL9z-6JRnGYhW8vrLDl-w?both#Rancher-installation

vsellier moved this task from in-progress to done on the System administration board.Apr 15 2022, 9:32 AM

vsellier mentioned this in T4063: Deploy gitlab instance for production.Oct 5 2022, 9:57 AM

This task has been migrated to GitLab.

Bootstrap a gitlab instance hosted on our infrastructureClosed, MigratedEdits LockedActions

Description

Revisions and Commits

Related Objects

Event Timeline

Bootstrap a gitlab instance hosted on our infrastructure
Closed, MigratedEdits Locked
Actions