Page MenuHomeSoftware Heritage

staging: Deploy graphql service
Closed, MigratedEdits Locked

Description

  • (vse) create a dedicated repository sysadm-environment/k8s-clusters-config:argo-workflows:/poc-argoworkflow (repo:branch:/directory)
  • (vse) Publish workflow code in ^ directory: "/workflows" [1]
  • (vse) Deployment doc on argoWorkflow
  • (ath) Deployment doc on argoCD
  • (ath) Publish argocd deployment code related to graphql in sysadm-environment/k8s-clusters-config:argocd:/
  • (ard) Deploy graphql locally (using minikube as cluster)
    • Create dockerhub image softwareheritage/graphql:latest
    • Deploy it from the swh-charts/swh-graphql helm chart
  • (ard) Deploy staging kubernetes cluster
    • Iinventory planification on new nodes and ip
    • D8152: Prepare puppet manifest to add correct role to new cluster nodes
    • D8155: Prepare terraform manifest for cluster and nodes creation
    • Deploy new graphql rancher cluster ^
  • (ard) Deploy graphql on the cluster
    • Manually Deploy graphql from the swh-charts/swh-graphql helm chart
    • D8159: Expose graphql instance through reverse proxy to one cluster member
    • 703bd22: Fix with varnish configuration so the graphql application is reached [2]
    • T4135#88666: Fight with graphql-worker{1,2} so they properly register on the cluster rancher
  • (ard/ath) T4413 Deploy argocd on admin vlan
  • (ard/ath) Decomission cluster-graphql to the benefit of cluster-graphql3
    • Register graphql-worker{3,2,1} on cluster-graphql3
    • Unregister graphql-worker0 from cluster-graphql (and clean up docker leftovers)
    • Register graphql-worker0 to cluster-graphql3
    • Destroy "cluster-graphql" (terraform)
  • rSPSITE0921c2b: Remove http basic auth currently configured which does not work [4]
  • Install load-balancer to access the graphql backend instead of a hard-coded member of the ingress (implementation needs to be determined first)
  • D8429: Monitoring / Alerting: Add a puppet icinga check to raise if the service is down
  • D8423: Update the documentation with new service url

[1] rSKCONFf89d3a7a6bc43290829b62cf139ebb54f31a1873

[2] https://graphql.staging.swh.network/

[3] https://argo-worker01.internal.admin.swh.network/

[4] The main page is served ok (with the proper creds). But after that, the actual

queries to the backend are failing to fetch the data through POST without
authentication (so pretty much graphql is useless with that http auth). Currently,
puppet is deactivated on rp0.staging with a hot-patched varnish to deactivate http
auth for graphql.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ardumont raised the priority of this task from Low to Normal.Jul 20 2022, 2:51 PM
ardumont updated the task description. (Show Details)

Fight with graphql-worker{1,2} so they properly register on the cluster rancher

Status on the graphql cluster mess.
Only one node registered properly within the "cluster-graphql".
No logs whatsoever helps in determining why others are just stuck in registering and doing nothing but wait.

I've created another cluster "cluster-graphql2" (through terraform).
I've tried after cleaning up the graphql-worker{1,2,3} to register it to that new cluster.
Again to the same result without much clues to go on to debug.

I've then tried to remove that cluster altogether and that failed poorly as well.
It's just plainly refuse to destroy that cluster be it through terraform or through the rancher ui.

So in the end, fed up with banging my head against the wall, i've removed that state from terraform [1]

[1]

terraform state rm rancher2_cluster.cluster-graphql2
Removed rancher2_cluster.cluster-graphql2
Successfully removed 1 resource instance(s).

(w/ amadou) we've finally managed to create a new cluster graphql "cluster-graphql3" with a specific version (matching the one from the internship) [1]
(commits on their way)

We'll decomission the previous stuck "cluster-graphql" and migrate the node attached to that cluster to the one that works.
^ Once we are done with the argocd installation.

[1] kubernetes_version = "v1.21.12-rancher1-1"

ArgoCD deployed graphql to staging [1]

TODO:

  • Install argocd through the admin rp (instead of the opened rule to allow access to it through vpn)
  • Make argocd deal with itself now that the bootstrap is done
  • Install load-balancer to access the graphql backend instead of a hard-coded member of the ingress (actually unsure whether < is necessary or not). [2]

[1]

[2] https://forge.softwareheritage.org/source/puppet-swh-site/browse/production/data/deployments/staging/common.yaml$152

ardumont changed the status of subtask T4413: Deploy argocd on admin vlan from Open to Work in Progress.Jul 29 2022, 6:20 PM
ardumont claimed this task.
ardumont moved this task from deployed/landed/monitoring to done on the System administration board.