staging first:
- [x] D7606: Reuse rancher cluster (used for our gitlab in-house experiment)
- [x] D7600: elastic worker node needs a specific role with docker prepared
- [x] D7624: P1342: Upgrade proxmox vm template
- [x] D7625, P1343: Declare new vm template with zfs dependency ready (so automation is not requiring reboot in the middle)
- [x] D7607: Register vms to cluster rancher
- [x] Build more recent image (softwareheritage/loaders:2022-04-27) [3]
- [x] Push to softwareheritage hub registry (no ci just yet)
- [x] correctness: vms runs docker container images of lister/loader images [4]
- [x] Properly declare vms to run docker images of lister/loader services
- [x] Monitor services: install prometheus, grafana [2]
- [x] R260:4c8a3a64e725c15a16f95c454df2a0b63647cb02: Make elastic worker reports their statsd metric to prometheus
- [x] R260:fedd973b26a44c5470eeb0781feefcfab9e69724: Scrape prometheus exporter metrics (so the proper swh metrics show up)
- [x] D8381: Make "archive-staging" cluster's prometheus write its metrics an azure bucket (thanos sidecar process)
- [x] D8385: Expose the thanos gateway service which read that azure bucket
- [x] D8385: Update thanos query service to read that gateway ^
- [x] Determine whether metrics are actually queryable
- [x] D8398, D8400: Integrate lister to the archive staging cluster
- [ ] Make "elastic" services' pushed their log to the main swh log infrastructure (related to journalbeat, logstash, ...)
- [x] ~~(Optional) Rework puppet manifest to actually run the registration command [1]~~
End goal:
- listing and loading happens
- resulting logs are pushed to our standard kibana logs infrastructure
- stats results are pushed to our standard grafana
Annex:
- ci build swh docker images (we can reuse existing ones at first)
[1] It's currently proxmox but later we'll have to do it without proxmox with baremetal
machines
[2] https://rancher.euwest.azure.internal.softwareheritage.org/k8s/clusters/c-t85mz/api/v1/namespaces/cattle-monitoring-system/services/http:rancher-monitoring-grafana:80/proxy/d/rancher-home-1/home?orgId=1&from=1651067146437&to=1651070746437
[3] Built out of swh-environment's swh/stack image for now (and just tagged with loaders in it)
[4]
```
$ cd $SWH_ENVIRONMENT_HOME/snippets/sysadmin/T3592-elastic-workers
$ cat loader-pypi.staging.values.yaml
# Default values for worker.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
amqp:
username: <redacted>
password: <redacted>
host: scheduler0.internal.staging.swh.network
queue_threshold: 10 # spawn worker per increment of `value` messages
queues:
- swh.loader.package.pypi.tasks.LoadPyPI
storage:
host: storage1.internal.staging.swh.network
swh:
loader:
image: softwareheritage/loaders
version: latest
$ helm install -f ./loader-pypi.staging.values.yaml workers ./worker
NAME: workers
LAST DEPLOYED: Wed Apr 27 18:57:03 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
kubectl get pods -w
NAME READY STATUS RESTARTS AGE
loaders-6bf6ddd897-gjh2w 1/1 Running 0 10m
loaders-6bf6ddd897-kkb7s 1/1 Running 0 10m
loaders-6bf6ddd897-lxl6b 1/1 Running 0 10m
loaders-6bf6ddd897-sjl26 1/1 Running 0 10m
loaders-6bf6ddd897-t59p7 1/1 Running 0 10m
...
$ kubectl logs loaders-6bf6ddd897-t59p7 | tail
[2022-04-27 17:05:00,462: INFO/MainProcess] Task swh.loader.package.pypi.tasks.LoadPyPI[a54e5fb2-8bc0-4a42-a586-58b5f8d3ebc1] received
[2022-04-27 17:05:04,767: INFO/MainProcess] sync with celery@loaders-6bf6ddd897-kkb7s
[2022-04-27 17:05:04,773: INFO/MainProcess] sync with celery@loaders-6bf6ddd897-jflck
[2022-04-27 17:05:27,170: INFO/MainProcess] missed heartbeat from celery@loaders-6bf6ddd897-jflck
[2022-04-27 17:06:01,504: INFO/ForkPoolWorker-1] Task swh.loader.package.pypi.tasks.LoadPyPI[b1d7bc2f-1294-47fd-8c8b-00775fe6a990] succeeded in 61.043102499999804s: {'status': 'eventful', 'snapshot_id': '7ca9564774a0fc2bfc2cf1234c8816c5193e33c2'}
[2022-04-27 17:06:01,514: INFO/MainProcess] Task swh.loader.package.pypi.tasks.LoadPyPI[186abb71-8595-4e97-a26c-830fc472a5dc] received
[2022-04-27 17:06:56,132: INFO/ForkPoolWorker-1] Task swh.loader.package.pypi.tasks.LoadPyPI[a54e5fb2-8bc0-4a42-a586-58b5f8d3ebc1] succeeded in 54.61585931099944s: {'status': 'eventful', 'snapshot_id': 'f2eaeb4d4d729bb4dcdb26eadd48e6ead2af5c9b'}
[2022-04-27 17:06:57,748: INFO/MainProcess] Task swh.loader.package.pypi.tasks.LoadPyPI[89e482b0-7d46-477c-9737-ba286dca5f31] received
[2022-04-27 17:10:20,229: INFO/ForkPoolWorker-2] Task swh.loader.package.pypi.tasks.LoadPyPI[186abb71-8595-4e97-a26c-830fc472a5dc] succeeded in 202.43778917999953s: {'status': 'eventful', 'snapshot_id': '52565afbd0c3483cb76782f1d57303f6c02e52ed'}
[2022-04-27 17:10:20,242: INFO/MainProcess] Task swh.loader.package.pypi.tasks.LoadPyPI[6418b3f5-f8e3-41f2-91b8-a19d0220b746] received
```