Change Details

Reduce azure cost: change workers to 'b2ms' vms (current 'ds2v2' underused and costly) Plan: - [x] Reasoning: https://hedgedoc.softwareheritage.org/0_eK1R3iSFmMWxwHDQfqOw?edit - [x] Provision vault-worker[01-02] as b2ms (terraform) - [x] Decomission worker13 - [x] Check vault worker are doing their job [1] - [x] Decomission worker[11-12] - [x] Adapt puppet manifest to the fqdn changes ^ and deploy - [x] Provision indexer-worker[01-02] as b2ms (terraform) - [x] Check everything is fine ^ (firewall rule to edit to allow connection) - [x] Decomission ds2v2 worker[07-10] - [x] Provision indexer-worker[03-06] as b2ms (terraform) - [x] Decomission remaining ds2v2 worker[03-06] - [x] Update firewall rule + alias - [x] Update inventory with vms and network interfaces according to ^ - [x] ~~Kept worker[01-02] for now (so they finish their current job consuming old queue messages)~~ [2] - [ ] Clean up old oneshot tasks related to ^ [4] Note: - This talks about `worker*.euwest.azure` nodes - Decomission is deleting the node, then remove references to it within puppet master, then update inventory [1] ``` Jul 18 14:45:59 vault-worker01 python3[2648]: [2022-07-18 14:45:59,239: INFO/MainProcess] vault_cooker@vault-worker01.euwest.azure.internal.softwareheritage.org ready. Jul 18 14:58:49 vault-worker01 python3[2648]: [2022-07-18 14:58:49,852: INFO/MainProcess] Received task: swh.vault.cooking_tasks.SWHCookingTask[a3c95ae7-4256-4231-bca7-d3224a9149ce] Jul 18 14:58:54 vault-worker01 python3[2670]: [2022-07-18 14:58:54,821: INFO/ForkPoolWorker-16] Task swh.vault.cooking_tasks.SWHCookingTask[a3c95ae7-4256-4231-bca7-d3224a9149ce] succeeded in 4.852631129999963s: None Jul 18 15:01:58 vault-worker02 python3[617]: [2022-07-18 15:01:58,023: INFO/MainProcess] Connected to amqp://swhconsumer:**@rabbitmq:5672// Jul 18 15:01:58 vault-worker02 python3[617]: [2022-07-18 15:01:58,293: INFO/MainProcess] vault_cooker@vault-worker02.euwest.azure.internal.softwareheritage.org ready. Jul 18 15:02:59 vault-worker02 python3[617]: [2022-07-18 15:02:59,734: INFO/MainProcess] Received task: swh.vault.cooking_tasks.SWHCookingTask[e3649dcc-9d53-4d88-8245-2543e97d584a] Jul 18 15:03:19 vault-worker02 python3[997]: [2022-07-18 15:03:19,915: INFO/ForkPoolWorker-16] Task swh.vault.cooking_tasks.SWHCookingTask[e3649dcc-9d53-4d88-8245-2543e97d584a] succeeded in 20.0749026s: None ``` [2] Too much lag that will take some time to subside with only 2 vms. Instead, as the new vms will work on the resetted topics and will pass on the missing data [3], we can just scratch those now in the end. [3] T4282#88364 [4] ``` 11:50:47 softwareheritage-scheduler@belvedere:5432=> select now(), status, count(*) from task where type = 'index-origin-metadata' group by status; +-------------------------------+------------------------+---------+ | now | status | count | +-------------------------------+------------------------+---------+ | 2022-07-19 09:50:55.403248+00 | next_run_not_scheduled | 9802941 | | 2022-07-19 09:50:55.403248+00 | next_run_scheduled | 5263 | | 2022-07-19 09:50:55.403248+00 | completed | 3225591 | | 2022-07-19 09:50:55.403248+00 | disabled | 5736 | +-------------------------------+------------------------+---------+ (4 rows) Time: 27451.213 ms (00:27.451) softwareheritage-scheduler=# update task set status='disabled' where type = 'index-origin-metadata' and status in ('next_run_scheduled', 'next_run_not_scheduled'); ... ``` (ongoing ^)

Reduce azure cost: change workers to 'b2ms' vms (current 'ds2v2' underused and costly) Plan: - [x] Reasoning: https://hedgedoc.softwareheritage.org/0_eK1R3iSFmMWxwHDQfqOw?edit - [x] Provision vault-worker[01-02] as b2ms (terraform) - [x] Decomission worker13 - [x] Check vault worker are doing their job [1] - [x] Decomission worker[11-12] - [x] Adapt puppet manifest to the fqdn changes ^ and deploy - [x] Provision indexer-worker[01-02] as b2ms (terraform) - [x] Check everything is fine ^ (firewall rule to edit to allow connection) - [x] Decomission ds2v2 worker[07-10] - [x] Provision indexer-worker[03-06] as b2ms (terraform) - [x] Decomission remaining ds2v2 worker[03-06] - [x] Update firewall rule + alias - [x] Update inventory with vms and network interfaces according to ^ - [x] ~~Kept worker[01-02] for now (so they finish their current job consuming old queue messages)~~ [2] - [x] Clean up old oneshot tasks related to ^ [4] Note: - This talks about `worker*.euwest.azure` nodes - Decomission is deleting the node, then remove references to it within puppet master, then update inventory [1] ``` Jul 18 14:45:59 vault-worker01 python3[2648]: [2022-07-18 14:45:59,239: INFO/MainProcess] vault_cooker@vault-worker01.euwest.azure.internal.softwareheritage.org ready. Jul 18 14:58:49 vault-worker01 python3[2648]: [2022-07-18 14:58:49,852: INFO/MainProcess] Received task: swh.vault.cooking_tasks.SWHCookingTask[a3c95ae7-4256-4231-bca7-d3224a9149ce] Jul 18 14:58:54 vault-worker01 python3[2670]: [2022-07-18 14:58:54,821: INFO/ForkPoolWorker-16] Task swh.vault.cooking_tasks.SWHCookingTask[a3c95ae7-4256-4231-bca7-d3224a9149ce] succeeded in 4.852631129999963s: None Jul 18 15:01:58 vault-worker02 python3[617]: [2022-07-18 15:01:58,023: INFO/MainProcess] Connected to amqp://swhconsumer:**@rabbitmq:5672// Jul 18 15:01:58 vault-worker02 python3[617]: [2022-07-18 15:01:58,293: INFO/MainProcess] vault_cooker@vault-worker02.euwest.azure.internal.softwareheritage.org ready. Jul 18 15:02:59 vault-worker02 python3[617]: [2022-07-18 15:02:59,734: INFO/MainProcess] Received task: swh.vault.cooking_tasks.SWHCookingTask[e3649dcc-9d53-4d88-8245-2543e97d584a] Jul 18 15:03:19 vault-worker02 python3[997]: [2022-07-18 15:03:19,915: INFO/ForkPoolWorker-16] Task swh.vault.cooking_tasks.SWHCookingTask[e3649dcc-9d53-4d88-8245-2543e97d584a] succeeded in 20.0749026s: None ``` [2] Too much lag that will take some time to subside with only 2 vms. Instead, as the new vms will work on the resetted topics and will pass on the missing data [3], we can just scratch those now in the end. [3] T4282#88364 [4] ``` 11:50:47 softwareheritage-scheduler@belvedere:5432=> select now(), status, count(*) from task where type = 'index-origin-metadata' group by status; +-------------------------------+------------------------+---------+ | now | status | count | +-------------------------------+------------------------+---------+ | 2022-07-19 09:50:55.403248+00 | next_run_not_scheduled | 9802941 | | 2022-07-19 09:50:55.403248+00 | next_run_scheduled | 5263 | | 2022-07-19 09:50:55.403248+00 | completed | 3225591 | | 2022-07-19 09:50:55.403248+00 | disabled | 5736 | +-------------------------------+------------------------+---------+ (4 rows) Time: 27451.213 ms (00:27.451) softwareheritage-scheduler=# update task set status='disabled' where type = 'index-origin-metadata' and status in ('next_run_scheduled', 'next_run_not_scheduled'); UPDATE 9808204 12:28:16 softwareheritage-scheduler@belvedere:5432=> select now(), status, count(*) from task where type = 'index-origin-metadata' group by status; +-------------------------------+-----------+---------+ | now | status | count | +-------------------------------+-----------+---------+ | 2022-07-19 10:28:26.489037+00 | completed | 3225591 | | 2022-07-19 10:28:26.489037+00 | disabled | 9813940 | +-------------------------------+-----------+---------+ (2 rows) Time: 32793.481 ms (00:32.793) ``` (ongoing ^)

Reduce azure cost: change workers to 'b2ms' vms (current 'ds2v2' underused and costly) Plan: - [x] Reasoning: https://hedgedoc.softwareheritage.org/0_eK1R3iSFmMWxwHDQfqOw?edit - [x] Provision vault-worker[01-02] as b2ms (terraform) - [x] Decomission worker13 - [x] Check vault worker are doing their job [1] - [x] Decomission worker[11-12] - [x] Adapt puppet manifest to the fqdn changes ^ and deploy - [x] Provision indexer-worker[01-02] as b2ms (terraform) - [x] Check everything is fine ^ (firewall rule to edit to allow connection) - [x] Decomission ds2v2 worker[07-10] - [x] Provision indexer-worker[03-06] as b2ms (terraform) - [x] Decomission remaining ds2v2 worker[03-06] - [x] Update firewall rule + alias - [x] Update inventory with vms and network interfaces according to ^ - [x] ~~Kept worker[01-02] for now (so they finish their current job consuming old queue messages)~~ [2] - [ - [x] Clean up old oneshot tasks related to ^ [4] Note: - This talks about `worker*.euwest.azure` nodes - Decomission is deleting the node, then remove references to it within puppet master, then update inventory [1] ``` Jul 18 14:45:59 vault-worker01 python3[2648]: [2022-07-18 14:45:59,239: INFO/MainProcess] vault_cooker@vault-worker01.euwest.azure.internal.softwareheritage.org ready. Jul 18 14:58:49 vault-worker01 python3[2648]: [2022-07-18 14:58:49,852: INFO/MainProcess] Received task: swh.vault.cooking_tasks.SWHCookingTask[a3c95ae7-4256-4231-bca7-d3224a9149ce] Jul 18 14:58:54 vault-worker01 python3[2670]: [2022-07-18 14:58:54,821: INFO/ForkPoolWorker-16] Task swh.vault.cooking_tasks.SWHCookingTask[a3c95ae7-4256-4231-bca7-d3224a9149ce] succeeded in 4.852631129999963s: None Jul 18 15:01:58 vault-worker02 python3[617]: [2022-07-18 15:01:58,023: INFO/MainProcess] Connected to amqp://swhconsumer:**@rabbitmq:5672// Jul 18 15:01:58 vault-worker02 python3[617]: [2022-07-18 15:01:58,293: INFO/MainProcess] vault_cooker@vault-worker02.euwest.azure.internal.softwareheritage.org ready. Jul 18 15:02:59 vault-worker02 python3[617]: [2022-07-18 15:02:59,734: INFO/MainProcess] Received task: swh.vault.cooking_tasks.SWHCookingTask[e3649dcc-9d53-4d88-8245-2543e97d584a] Jul 18 15:03:19 vault-worker02 python3[997]: [2022-07-18 15:03:19,915: INFO/ForkPoolWorker-16] Task swh.vault.cooking_tasks.SWHCookingTask[e3649dcc-9d53-4d88-8245-2543e97d584a] succeeded in 20.0749026s: None ``` [2] Too much lag that will take some time to subside with only 2 vms. Instead, as the new vms will work on the resetted topics and will pass on the missing data [3], we can just scratch those now in the end. [3] T4282#88364 [4] ``` 11:50:47 softwareheritage-scheduler@belvedere:5432=> select now(), status, count(*) from task where type = 'index-origin-metadata' group by status; +-------------------------------+------------------------+---------+ | now | status | count | +-------------------------------+------------------------+---------+ | 2022-07-19 09:50:55.403248+00 | next_run_not_scheduled | 9802941 | | 2022-07-19 09:50:55.403248+00 | next_run_scheduled | 5263 | | 2022-07-19 09:50:55.403248+00 | completed | 3225591 | | 2022-07-19 09:50:55.403248+00 | disabled | 5736 | +-------------------------------+------------------------+---------+ (4 rows) Time: 27451.213 ms (00:27.451) softwareheritage-scheduler=# update task set status='disabled' where type = 'index-origin-metadata' and status in ('next_run_scheduled', 'next_run_not_scheduled'); ...UPDATE 9808204 12:28:16 softwareheritage-scheduler@belvedere:5432=> select now(), status, count(*) from task where type = 'index-origin-metadata' group by status; +-------------------------------+-----------+---------+ | now | status | count | +-------------------------------+-----------+---------+ | 2022-07-19 10:28:26.489037+00 | completed | 3225591 | | 2022-07-19 10:28:26.489037+00 | disabled | 9813940 | +-------------------------------+-----------+---------+ (2 rows) Time: 32793.481 ms (00:32.793) ``` (ongoing ^)