Page MenuHomeSoftware Heritage

Migrate azure worker vms to cheaper and more efficient vms
Closed, MigratedEdits Locked

Description

Reduce azure cost: change workers to 'b2ms' vms (current 'ds2v2' underused and costly)

Plan:

  • Reasoning: https://hedgedoc.softwareheritage.org/0_eK1R3iSFmMWxwHDQfqOw?edit
  • Provision vault-worker[01-02] as b2ms (terraform)
  • Decomission worker13
  • Check vault worker are doing their job [1]
  • Decomission worker[11-12]
  • Adapt puppet manifest to the fqdn changes ^ and deploy
  • Provision indexer-worker[01-02] as b2ms (terraform)
  • Check everything is fine ^ (firewall rule to edit to allow connection)
  • Decomission ds2v2 worker[07-10]
  • Provision indexer-worker[03-06] as b2ms (terraform)
  • Decomission remaining ds2v2 worker[03-06]
  • Update firewall rule + alias
  • Update inventory with vms and network interfaces according to ^
  • Kept worker[01-02] for now (so they finish their current job consuming old queue messages) [2]
  • Clean up old oneshot tasks related to ^ [4]

Note:

  • This talks about worker*.euwest.azure nodes
  • Decomission is deleting the node, then remove references to it within puppet master, then update inventory

[1]

Jul 18 14:45:59 vault-worker01 python3[2648]: [2022-07-18 14:45:59,239: INFO/MainProcess] vault_cooker@vault-worker01.euwest.azure.internal.softwareheritage.org ready.
Jul 18 14:58:49 vault-worker01 python3[2648]: [2022-07-18 14:58:49,852: INFO/MainProcess] Received task: swh.vault.cooking_tasks.SWHCookingTask[a3c95ae7-4256-4231-bca7-d3224a9149ce]
Jul 18 14:58:54 vault-worker01 python3[2670]: [2022-07-18 14:58:54,821: INFO/ForkPoolWorker-16] Task swh.vault.cooking_tasks.SWHCookingTask[a3c95ae7-4256-4231-bca7-d3224a9149ce] succeeded in 4.852631129999963s: None
Jul 18 15:01:58 vault-worker02 python3[617]: [2022-07-18 15:01:58,023: INFO/MainProcess] Connected to amqp://swhconsumer:**@rabbitmq:5672//
Jul 18 15:01:58 vault-worker02 python3[617]: [2022-07-18 15:01:58,293: INFO/MainProcess] vault_cooker@vault-worker02.euwest.azure.internal.softwareheritage.org ready.
Jul 18 15:02:59 vault-worker02 python3[617]: [2022-07-18 15:02:59,734: INFO/MainProcess] Received task: swh.vault.cooking_tasks.SWHCookingTask[e3649dcc-9d53-4d88-8245-2543e97d584a]
Jul 18 15:03:19 vault-worker02 python3[997]: [2022-07-18 15:03:19,915: INFO/ForkPoolWorker-16] Task swh.vault.cooking_tasks.SWHCookingTask[e3649dcc-9d53-4d88-8245-2543e97d584a] succeeded in 20.0749026s: None

[2] Too much lag that will take some time to subside with only 2 vms. Instead, as the
new vms will work on the resetted topics and will pass on the missing data [3], we can
just scratch those now in the end.

[3] T4282#88364

[4]

11:50:47 softwareheritage-scheduler@belvedere:5432=> select now(), status, count(*) from task where type = 'index-origin-metadata' group by status;
+-------------------------------+------------------------+---------+
|              now              |         status         |  count  |
+-------------------------------+------------------------+---------+
| 2022-07-19 09:50:55.403248+00 | next_run_not_scheduled | 9802941 |
| 2022-07-19 09:50:55.403248+00 | next_run_scheduled     |    5263 |
| 2022-07-19 09:50:55.403248+00 | completed              | 3225591 |
| 2022-07-19 09:50:55.403248+00 | disabled               |    5736 |
+-------------------------------+------------------------+---------+
(4 rows)

Time: 27451.213 ms (00:27.451)

softwareheritage-scheduler=# update task set status='disabled' where type = 'index-origin-metadata' and status in ('next_run_scheduled', 'next_run_not_scheduled');
UPDATE 9808204
12:28:16 softwareheritage-scheduler@belvedere:5432=> select now(), status, count(*) from task where type = 'index-origin-metadata' group by status;
+-------------------------------+-----------+---------+
|              now              |  status   |  count  |
+-------------------------------+-----------+---------+
| 2022-07-19 10:28:26.489037+00 | completed | 3225591 |
| 2022-07-19 10:28:26.489037+00 | disabled  | 9813940 |
+-------------------------------+-----------+---------+
(2 rows)

Time: 32793.481 ms (00:32.793)

(ongoing ^)

Related Objects

Event Timeline

ardumont renamed this task from Reduce azure cost regarding vms to Migrate azure worker vms to cheaper vm.Jul 18 2022, 4:01 PM
ardumont updated the task description. (Show Details)
ardumont renamed this task from Migrate azure worker vms to cheaper vm to Migrate azure worker vms to cheaper and as efficient vms as before.Jul 18 2022, 5:43 PM
ardumont updated the task description. (Show Details)
ardumont changed the task status from Open to Work in Progress.Jul 18 2022, 6:23 PM
ardumont updated the task description. (Show Details)
ardumont moved this task from Weekly backlog to in-progress on the System administration board.
ardumont renamed this task from Migrate azure worker vms to cheaper and as efficient vms as before to Migrate azure worker vms to cheaper and more efficient vms.Jul 18 2022, 6:33 PM
ardumont claimed this task.
ardumont updated the task description. (Show Details)
ardumont moved this task from deployed/landed/monitoring to done on the System administration board.