Reduce azure cost: change workers to 'b2ms' vms (current 'ds2v2' underused and costly)
Plan:
- [x] Reasoning: https://hedgedoc.softwareheritage.org/0_eK1R3iSFmMWxwHDQfqOw?edit
- [x] Provision vault-worker[01-02] as b2ms (terraform)
- [x] Decomission worker13
- [x] Check vault worker are doing their job [1]
- [x] Decomission worker[11-12]
- [x] Adapt puppet manifest to the fqdn changes ^ and deploy
- [x] Provision indexer-worker[01-02] as b2ms (terraform)
- [x] Check everything is fine ^ (firewall rule to edit to allow connection)
- [x] Decomission ds2v2 worker[07-10]
- [x] Provision indexer-worker[03-06] as b2ms (terraform)
- [x] Decomission remaining ds2v2 worker[03-06]
- [x] Update firewall rule + alias
- [x] Update inventory with vms and network interfaces according to ^
- [x] ~~Kept worker[01-02] for now (so they finish their current job consuming old queue messages)~~ [2]
- [x] Clean up old oneshot tasks related to ^ [4]
Note:
- This talks about `worker*.euwest.azure` nodes
- Decomission is deleting the node, then remove references to it within puppet master, then update inventory
[1]
```
Jul 18 14:45:59 vault-worker01 python3[2648]: [2022-07-18 14:45:59,239: INFO/MainProcess] vault_cooker@vault-worker01.euwest.azure.internal.softwareheritage.org ready.
Jul 18 14:58:49 vault-worker01 python3[2648]: [2022-07-18 14:58:49,852: INFO/MainProcess] Received task: swh.vault.cooking_tasks.SWHCookingTask[a3c95ae7-4256-4231-bca7-d3224a9149ce]
Jul 18 14:58:54 vault-worker01 python3[2670]: [2022-07-18 14:58:54,821: INFO/ForkPoolWorker-16] Task swh.vault.cooking_tasks.SWHCookingTask[a3c95ae7-4256-4231-bca7-d3224a9149ce] succeeded in 4.852631129999963s: None
Jul 18 15:01:58 vault-worker02 python3[617]: [2022-07-18 15:01:58,023: INFO/MainProcess] Connected to amqp://swhconsumer:**@rabbitmq:5672//
Jul 18 15:01:58 vault-worker02 python3[617]: [2022-07-18 15:01:58,293: INFO/MainProcess] vault_cooker@vault-worker02.euwest.azure.internal.softwareheritage.org ready.
Jul 18 15:02:59 vault-worker02 python3[617]: [2022-07-18 15:02:59,734: INFO/MainProcess] Received task: swh.vault.cooking_tasks.SWHCookingTask[e3649dcc-9d53-4d88-8245-2543e97d584a]
Jul 18 15:03:19 vault-worker02 python3[997]: [2022-07-18 15:03:19,915: INFO/ForkPoolWorker-16] Task swh.vault.cooking_tasks.SWHCookingTask[e3649dcc-9d53-4d88-8245-2543e97d584a] succeeded in 20.0749026s: None
```
[2] Too much lag that will take some time to subside with only 2 vms. Instead, as the
new vms will work on the resetted topics and will pass on the missing data [3], we can
just scratch those now in the end.
[3] T4282#88364
[4]
```
11:50:47 softwareheritage-scheduler@belvedere:5432=> select now(), status, count(*) from task where type = 'index-origin-metadata' group by status;
+-------------------------------+------------------------+---------+
| now | status | count |
+-------------------------------+------------------------+---------+
| 2022-07-19 09:50:55.403248+00 | next_run_not_scheduled | 9802941 |
| 2022-07-19 09:50:55.403248+00 | next_run_scheduled | 5263 |
| 2022-07-19 09:50:55.403248+00 | completed | 3225591 |
| 2022-07-19 09:50:55.403248+00 | disabled | 5736 |
+-------------------------------+------------------------+---------+
(4 rows)
Time: 27451.213 ms (00:27.451)
softwareheritage-scheduler=# update task set status='disabled' where type = 'index-origin-metadata' and status in ('next_run_scheduled', 'next_run_not_scheduled');
UPDATE 9808204
12:28:16 softwareheritage-scheduler@belvedere:5432=> select now(), status, count(*) from task where type = 'index-origin-metadata' group by status;
+-------------------------------+-----------+---------+
| now | status | count |
+-------------------------------+-----------+---------+
| 2022-07-19 10:28:26.489037+00 | completed | 3225591 |
| 2022-07-19 10:28:26.489037+00 | disabled | 9813940 |
+-------------------------------+-----------+---------+
(2 rows)
Time: 32793.481 ms (00:32.793)
```
(ongoing ^)