Page MenuHomeSoftware Heritage
Feed Advanced Search

Jan 7 2022

vsellier updated the diff for D6871: Move grafana on a dedicated server behind the admin RP.
  • Fix the database password resolution after the database name update
  • Restore the profile::grafana::objects call to manage the orgs and database declarations It's not ideal as it introduces a dependency with the reverse proxy
Jan 7 2022, 9:47 AM

Jan 6 2022

vsellier planned changes to D6871: Move grafana on a dedicated server behind the admin RP.

thanks for the validation, I have some pending changes in progress and to reply to the olasd's remarks so I change the status to planned changes

Jan 6 2022, 5:09 PM
vsellier updated the diff for D6871: Move grafana on a dedicated server behind the admin RP.
  • fix database name (not directly used by the configuration)
  • fix the prometheus snippets configuration
Jan 6 2022, 2:42 PM
vsellier closed T3827: Analyze low performance scheduling as Resolved.

The fix was released in the version v0.23.0 and deployed in staging and production.
Everything looks good.

Jan 6 2022, 11:21 AM · System administration
vsellier closed D6876: Allow to specify the visit grab parameters per visit type and policy.
Jan 6 2022, 9:26 AM
vsellier committed rDSCH5c836d64a5fc: Allow to specify the visit grab parameters per visit type and policy (authored by vsellier).
Allow to specify the visit grab parameters per visit type and policy
Jan 6 2022, 9:26 AM

Jan 5 2022

vsellier requested review of D6876: Allow to specify the visit grab parameters per visit type and policy.
Jan 5 2022, 6:31 PM
vsellier updated the diff for D6876: Allow to specify the visit grab parameters per visit type and policy.

upgrade accordingly the olasd's feedback

Jan 5 2022, 6:21 PM
vsellier requested review of D6876: Allow to specify the visit grab parameters per visit type and policy.
Jan 5 2022, 11:24 AM
vsellier added a revision to T3827: Analyze low performance scheduling: D6876: Allow to specify the visit grab parameters per visit type and policy.
Jan 5 2022, 9:19 AM · System administration
vsellier updated the diff for D6872: create the admin vm grafana0.

Upgrade bullseye template to 11.2

Jan 5 2022, 9:16 AM

Jan 4 2022

vsellier added a comment to T3827: Analyze low performance scheduling.

a dirty fix on the code to force the table sampling looks efficient

Jan 4 2022, 4:59 PM · System administration
vsellier added a comment to T3827: Analyze low performance scheduling.

The initial index was a try to improve the query for the origins_without_last_update policy. Indeed, it's not used by for the other policies (never_visited_oldest_update_first / already_visited_order_by_lag) so it's not ok

Jan 4 2022, 1:05 PM · System administration

Jan 3 2022

vsellier added a comment to T3827: Analyze low performance scheduling.

the first diagnostic is it seems relative to the origin selection query taking a long time to respond (usually > 30mn)

Jan 3 2022, 5:54 PM · System administration
vsellier updated the diff for D6872: create the admin vm grafana0.

good catches, thanks

Jan 3 2022, 12:16 PM
vsellier updated the diff for D6871: Move grafana on a dedicated server behind the admin RP.

rebase

Jan 3 2022, 9:40 AM

Dec 23 2021

vsellier added a comment to T3817: Install grafana on its own server.

The diffs are ready to be reviewed.
The migration will be performed at the beginning of January.

Dec 23 2021, 3:40 PM · System administration
vsellier requested review of D6872: create the admin vm grafana0.
Dec 23 2021, 3:13 PM
vsellier added a revision to T3817: Install grafana on its own server: D6872: create the admin vm grafana0.
Dec 23 2021, 3:13 PM · System administration
vsellier updated the diff for D6871: Move grafana on a dedicated server behind the admin RP.

allow the database monitoring

Dec 23 2021, 3:00 PM
vsellier updated the summary of D6871: Move grafana on a dedicated server behind the admin RP.
Dec 23 2021, 2:53 PM
vsellier updated the diff for D6871: Move grafana on a dedicated server behind the admin RP.

install the auto-generated dashboards with puppet

Dec 23 2021, 2:52 PM
vsellier updated the diff for D6871: Move grafana on a dedicated server behind the admin RP.

add the grafana-piechart-panel plugin installation

Dec 23 2021, 12:12 PM
vsellier closed D6870: network: remove explicit declaration of the physical interfaces.
Dec 23 2021, 9:49 AM
vsellier committed rSPSITE902148e0e209: network: remove explicit declaration of the physical interfaces (authored by vsellier).
network: remove explicit declaration of the physical interfaces
Dec 23 2021, 9:49 AM
vsellier updated the diff for D6870: network: remove explicit declaration of the physical interfaces.

rebase

Dec 23 2021, 9:49 AM

Dec 22 2021

vsellier retitled D6871: Move grafana on a dedicated server behind the admin RP from Move grafana on a dedicated server behind the admin RP **WIP** TODO: - add auto generated dashboard - install the grafana-piechart-panel plugin to Move grafana on a dedicated server behind the admin RP.
Dec 22 2021, 7:41 PM
vsellier added a revision to T3817: Install grafana on its own server: D6871: Move grafana on a dedicated server behind the admin RP.
Dec 22 2021, 7:40 PM · System administration
vsellier requested review of D6871: Move grafana on a dedicated server behind the admin RP.
Dec 22 2021, 7:40 PM
vsellier closed T3815: Diagnose swh-environment build failures as Resolved.

Thank, the build is now green \o/

Dec 22 2021, 7:23 PM
vsellier requested review of D6870: network: remove explicit declaration of the physical interfaces.
Dec 22 2021, 5:29 PM
vsellier added a revision to T3813: Migrate the staging database server to bullseye: D6870: network: remove explicit declaration of the physical interfaces.
Dec 22 2021, 5:29 PM · System administration (Component upgrades)
vsellier accepted D6869: deposit: Strip 'offset_bytes' from date dicts to support swh-model 4.0.0.

Thanks

Dec 22 2021, 5:14 PM
vsellier reopened T3815: Diagnose swh-environment build failures as "Work in Progress".

Thanks for creating the diff and submitting the issue on the frozen dict repo.

Dec 22 2021, 5:07 PM
vsellier added a comment to T3592: POC elastic worker infrastructure.

It seems the rancher network issue is fixed in version 2.6.3 which is quite a good news

swhworker@poc-rancher:~$ ./test-network.sh 
=> Start network overlay test
poc-rancher-sw0 can reach poc-rancher-sw0
poc-rancher-sw0 can reach poc-rancher-sw1
poc-rancher-sw1 can reach poc-rancher-sw0
poc-rancher-sw1 can reach poc-rancher-sw1
=> End network overlay test
Dec 22 2021, 3:14 PM · System administration
vsellier added a comment to T3320: Test rancher pros/cons.

It seems the network issue is fixed in version 2.6.3 which is quite a good news

Dec 22 2021, 2:31 PM · System administration
vsellier closed T283: investigate libvirt I/O slowdown as Wontfix.

Closing due to inactivity. Feel free to reopen if needed.

Dec 22 2021, 10:45 AM · System administration
vsellier closed T1503: Rename hypervisor3 to a museum name, a subtask of T1392: Add a new hypervisor, as Wontfix.
Dec 22 2021, 10:17 AM · System administration
vsellier closed T1503: Rename hypervisor3 to a museum name as Wontfix.

Closing, I guess we can live with this ;)
Feel free to reopen if you disagree

Dec 22 2021, 10:17 AM · System administration
vsellier placed T3545: Update the journalbeat version package up for grabs.
Dec 22 2021, 9:58 AM · Packagers, System administration
vsellier closed T3545: Update the journalbeat version package as Resolved.

Package upgraded in T3705.

Dec 22 2021, 9:58 AM · Packagers, System administration
vsellier edited projects for T3817: Install grafana on its own server, added: System administration; removed System administration (Component upgrades).
Dec 22 2021, 9:42 AM · System administration
vsellier changed the status of T3817: Install grafana on its own server from Open to Work in Progress.
Dec 22 2021, 9:42 AM · System administration

Dec 21 2021

vsellier closed T3579: Meta-task: upgrade infrastructure to Debian Bullseye as Resolved.
Dec 21 2021, 4:06 PM · System administration (Component upgrades)
vsellier added a comment to T3579: Meta-task: upgrade infrastructure to Debian Bullseye.

Closing this task as all the possible upgrade are done.
The delayed upgrades will be followed in dedicated task as it will be integradated in a more global task relative to the elastic infrastructure or the pergamon splitting task

Dec 21 2021, 4:06 PM · System administration (Component upgrades)
vsellier updated the task description for T3579: Meta-task: upgrade infrastructure to Debian Bullseye.
Dec 21 2021, 4:03 PM · System administration (Component upgrades)
vsellier closed T3807: Migrate kelvingrove (keycloak) to bullseye as Resolved.
root@kelvingrove:~# task=T3807
root@kelvingrove:/etc# puppet agent --disable "$task: dist-upgrade to bullseye"
root@kelvingrove:/etc# sed -i -e 's/buster/bullseye/;s,bullseye/updates,bullseye-security,' /etc/apt/sources.list.d/*
Dec 21 2021, 4:03 PM · System administration (Component upgrades)
vsellier closed T3807: Migrate kelvingrove (keycloak) to bullseye, a subtask of T3579: Meta-task: upgrade infrastructure to Debian Bullseye, as Resolved.
Dec 21 2021, 4:03 PM · System administration (Component upgrades)
vsellier added a comment to T3807: Migrate kelvingrove (keycloak) to bullseye.

The tests with vagrant are not showing any issue with puppet / keycloak after the upgrade so let's proceed to the upgrade.

Dec 21 2021, 3:38 PM · System administration (Component upgrades)
vsellier closed D6866: Force to use a previous version of frozendict to avoid segfault.
Dec 21 2021, 3:00 PM
vsellier committed rDCIDX9777195fdb1a: Force to use a previous version of frozendict to avoid segfault (authored by vsellier).
Force to use a previous version of frozendict to avoid segfault
Dec 21 2021, 3:00 PM
vsellier changed the status of T3807: Migrate kelvingrove (keycloak) to bullseye, a subtask of T3579: Meta-task: upgrade infrastructure to Debian Bullseye, from Open to Work in Progress.
Dec 21 2021, 2:13 PM · System administration (Component upgrades)
vsellier changed the status of T3807: Migrate kelvingrove (keycloak) to bullseye from Open to Work in Progress.
Dec 21 2021, 2:13 PM · System administration (Component upgrades)
vsellier updated the task description for T3579: Meta-task: upgrade infrastructure to Debian Bullseye.
Dec 21 2021, 12:13 PM · System administration (Component upgrades)
vsellier updated the task description for T3579: Meta-task: upgrade infrastructure to Debian Bullseye.
Dec 21 2021, 12:13 PM · System administration (Component upgrades)
vsellier closed T3801: Migrate production database servers to bullseye as Resolved.
Dec 21 2021, 12:12 PM · System administration (Component upgrades)
vsellier closed T3801: Migrate production database servers to bullseye, a subtask of T3579: Meta-task: upgrade infrastructure to Debian Bullseye, as Resolved.
Dec 21 2021, 12:12 PM · System administration (Component upgrades)
vsellier updated the task description for T3801: Migrate production database servers to bullseye.
Dec 21 2021, 12:09 PM · System administration (Component upgrades)
vsellier updated the task description for T3801: Migrate production database servers to bullseye.
Dec 21 2021, 12:09 PM · System administration (Component upgrades)
vsellier updated the task description for T3801: Migrate production database servers to bullseye.
Dec 21 2021, 11:49 AM · System administration (Component upgrades)
vsellier added a comment to D6866: Force to use a previous version of frozendict to avoid segfault.
In D6866#178393, @olasd wrote:

This deserves an upstream bug on frozendict 2.1.2, if you've managed to track it down...

Dec 21 2021, 11:08 AM
vsellier updated the diff for D6866: Force to use a previous version of frozendict to avoid segfault.

fix a typo

Dec 21 2021, 10:48 AM
vsellier updated the task description for T3801: Migrate production database servers to bullseye.
Dec 21 2021, 10:41 AM · System administration (Component upgrades)
vsellier added a comment to T3801: Migrate production database servers to bullseye.

A memory alert is logged on the idrac

	Correctable memory error logging disabled for a memory device at location DIMM_A9. 	Fri 17 Dec 2021 16:15:39

We will have to monitor in the future to check if this memory dimm has some weaknesses

Dec 21 2021, 10:13 AM · System administration (Component upgrades)
vsellier updated the task description for T3801: Migrate production database servers to bullseye.
Dec 21 2021, 9:59 AM · System administration (Component upgrades)
vsellier updated the task description for T3801: Migrate production database servers to bullseye.
Dec 21 2021, 9:34 AM · System administration (Component upgrades)
vsellier added a comment to T3801: Migrate production database servers to bullseye.

on moma:

  • puppet disabled
root@moma:/etc/softwareheritage/storage# puppet agent --disable 'T3801 upgrade database servers'
  • storage configuration update to use belvedere database and service restarted
Dec 21 2021, 9:24 AM · System administration (Component upgrades)
vsellier updated the task description for T3801: Migrate production database servers to bullseye.
Dec 21 2021, 9:16 AM · System administration (Component upgrades)
vsellier updated the task description for T3801: Migrate production database servers to bullseye.
Dec 21 2021, 9:06 AM · System administration (Component upgrades)
vsellier renamed T3801: Migrate production database servers to bullseye from Migrate database servers to bullseye to Migrate production database servers to bullseye.
Dec 21 2021, 9:04 AM · System administration (Component upgrades)
vsellier changed the status of T3801: Migrate production database servers to bullseye, a subtask of T3579: Meta-task: upgrade infrastructure to Debian Bullseye, from Open to Work in Progress.
Dec 21 2021, 9:04 AM · System administration (Component upgrades)
vsellier changed the status of T3801: Migrate production database servers to bullseye from Open to Work in Progress.
Dec 21 2021, 9:04 AM · System administration (Component upgrades)

Dec 20 2021

vsellier requested review of D6866: Force to use a previous version of frozendict to avoid segfault.
Dec 20 2021, 6:38 PM
vsellier added a revision to T3815: Diagnose swh-environment build failures: D6866: Force to use a previous version of frozendict to avoid segfault.
Dec 20 2021, 6:35 PM
vsellier added a comment to T3815: Diagnose swh-environment build failures.

It seems the problem is related to the new version 2.1.2 of the frozendict library released the 18h December.
Pinning the version to the previous 2.1.1 solved the problem

Dec 20 2021, 6:28 PM
vsellier added a comment to T3815: Diagnose swh-environment build failures.

For the segfault, I suspect an issue due to the OS difference inside the docker container and the host (debian 10 / debian 11)

root@e35f7a024575:/home/jenkins/swh-environment/swh-indexer# gdb python3 core
(gdb) where
#0  raise (sig=11) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  <signal handler called>
#2  0x00007f6548d70d46 in frozendict_new_barebone (type=0x7f6548d800e0 <PyFrozenDict_Type>)
    at /project/frozendict/src/3_7/frozendictobject.c:2214
#3  _frozendict_new (use_empty_frozendict=1, kwds=0x0, args=<optimized out>, type=0x7f6548d800e0 <PyFrozenDict_Type>)
    at /project/frozendict/src/3_7/frozendictobject.c:2255
#4  frozendict_new (type=0x7f6548d800e0 <PyFrozenDict_Type>, args=<optimized out>, kwds=0x0)
    at /project/frozendict/src/3_7/frozendictobject.c:2290
#5  0x00000000005d9bd7 in _PyObject_FastCallKeywords ()
#136 0x000000000065468e in _Py_UnixMain ()
#137 0x00007f654efe109b in __libc_start_main (main=0x4bc560 <main>, argc=9, argv=0x7ffe6f651488, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe6f651478) at ../csu/libc-start.c:308
#138 0x00000000005e0e8a in _start ()
(gdb)

I'm trying to reproduce the problem locally in a vm to check if a workaround can be foud.

Dec 20 2021, 5:10 PM
vsellier changed the status of T3815: Diagnose swh-environment build failures from Open to Work in Progress.
Dec 20 2021, 3:23 PM

Dec 17 2021

vsellier closed T2655: Configure cloud-init to avoid fighting with puppet changes as Resolved.
Dec 17 2021, 4:54 PM · System administration
vsellier committed rSPRE58fd7dcbb4f6: terraform: add the cloud-init fact on the server initialization (authored by vsellier).
terraform: add the cloud-init fact on the server initialization
Dec 17 2021, 4:54 PM
vsellier added a comment to T2655: Configure cloud-init to avoid fighting with puppet changes.

fact installed on the staging nodes:

root@pergamon:/etc/clustershell# clush -b -w @staging 'if [ -e /etc/systemd/system/cloud-init.target.wants/cloud-init.service ]; then echo "cloud-init installed"; echo cloudinit_enabled=true > /etc/facter/facts.d/cloud-init.txt; else echo "cloud-init not installed"; fi'
---------------
counters0.internal.staging.swh.network,deposit.internal.staging.swh.network,objstorage0.internal.staging.swh.network,poc-rancher-sw[0-1].internal.staging.swh.network,poc-rancher.internal.staging.swh.network,rp0.internal.staging.swh.network,scheduler0.internal.staging.swh.network,search0.internal.staging.swh.network,vault.internal.staging.swh.network,webapp.internal.staging.swh.network,worker[0-3].internal.staging.swh.network (15)
---------------
cloud-init installed
---------------
db1.internal.staging.swh.network,storage1.internal.staging.swh.network (2)
---------------
cloud-init not installed
Dec 17 2021, 4:44 PM · System administration
vsellier closed D6861: cloud-init: disable the /etc/hosts upgrade on boot.
Dec 17 2021, 4:28 PM
vsellier committed rSPSITE3991fc44500b: cloud-init: disable the /etc/hosts upgrade on boot (authored by vsellier).
cloud-init: disable the /etc/hosts upgrade on boot
Dec 17 2021, 4:28 PM
vsellier updated the diff for D6861: cloud-init: disable the /etc/hosts upgrade on boot.

rebase

Dec 17 2021, 4:27 PM
vsellier closed T3778: The docker-dev build is often failing as Resolved.

During the week, only one request took more than 1s.
As it looks rare, it seems it's relative to the load on the server during the build, so I'm not sure it worst the case to investigate further.

Dec 17 2021, 4:25 PM · System administration
vsellier closed T3813: Migrate the staging database server to bullseye as Resolved.

workers:

Before the migration

root@pergamon:~# clush -b -w @staging-workers 'set -e; puppet agent --disable "T3812"; puppet agent --disable T3771; cd /etc/systemd/system/multi-user.target.wants; for unit in swh-worker@*; do systemctl disable $unit; done; systemctl stop --no-block swh-worker@*; sleep 300; systemctl kill swh-worker@* -s 9'
Dec 17 2021, 4:14 PM · System administration (Component upgrades)
vsellier closed T3813: Migrate the staging database server to bullseye, a subtask of T3801: Migrate production database servers to bullseye, as Resolved.
Dec 17 2021, 4:14 PM · System administration (Component upgrades)
vsellier updated the task description for T3813: Migrate the staging database server to bullseye.
Dec 17 2021, 2:25 PM · System administration (Component upgrades)
vsellier retitled D6861: cloud-init: disable the /etc/hosts upgrade on boot from cloud-init: disable the /etc/host upgrade on boot to cloud-init: disable the /etc/hosts upgrade on boot.
Dec 17 2021, 2:22 PM
vsellier updated the diff for D6861: cloud-init: disable the /etc/hosts upgrade on boot.

fix a typo on the commit message

Dec 17 2021, 2:22 PM
vsellier updated the task description for T3801: Migrate production database servers to bullseye.
Dec 17 2021, 2:17 PM · System administration (Component upgrades)
vsellier updated the task description for T3801: Migrate production database servers to bullseye.
Dec 17 2021, 2:17 PM · System administration (Component upgrades)
vsellier changed the status of T3813: Migrate the staging database server to bullseye from Open to Work in Progress.
Dec 17 2021, 2:16 PM · System administration (Component upgrades)
vsellier added a comment to T2655: Configure cloud-init to avoid fighting with puppet changes.

Testing with this config file:

#cloud-config-jsonp
[{ "op": "replace", "path": "/manage_etc_hosts", "value": "False"}]

gives this error:

2021-12-16 22:35:11,471 - __init__.py[DEBUG]: Calling handler CloudConfigPartHandler: [['text/cloud-config', 'text/cloud-config-jsonp']] (text/cloud-config-jsonp, part-001, 3) with frequency always
2021-12-16 22:35:11,472 - cloud_config.py[DEBUG]: Merging by applying json patch [{"op": "replace", "path": "/manage_etc_hosts", "value": "False"}]
2021-12-16 22:35:11,472 - util.py[WARNING]: Failed at merging in cloud config part from part-001
2021-12-16 22:35:11,474 - util.py[DEBUG]: Failed at merging in cloud config part from part-001
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 138, in handle_part
    self._merge_patch(payload)
  File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 113, in _merge_patch
    self.cloud_buf = patch.apply(self.cloud_buf, in_place=False)
  File "/usr/lib/python3/dist-packages/jsonpatch.py", line 312, in apply
    obj = operation.apply(obj)
  File "/usr/lib/python3/dist-packages/jsonpatch.py", line 483, in apply
    raise JsonPatchConflict(msg)
jsonpatch.JsonPatchConflict: can't replace non-existent object 'manage_etc_hosts'
2021-12-16 22:35:11,475 - __init__.py[DEBUG]: Calling handler CloudConfigPartHandler: [['text/cloud-config', 'text/cloud-config-jsonp']] (__end__, None, 3) with frequency always
Dec 17 2021, 2:02 PM · System administration
vsellier added a revision to T2655: Configure cloud-init to avoid fighting with puppet changes: D6861: cloud-init: disable the /etc/hosts upgrade on boot.
Dec 17 2021, 9:29 AM · System administration
vsellier requested review of D6861: cloud-init: disable the /etc/hosts upgrade on boot.
Dec 17 2021, 9:29 AM

Dec 16 2021

vsellier changed the status of T2655: Configure cloud-init to avoid fighting with puppet changes from Open to Work in Progress.
Dec 16 2021, 10:57 PM · System administration
vsellier added a comment to T2655: Configure cloud-init to avoid fighting with puppet changes.

it seems cloud init does not support overriding a property defined in the user-data configuration:

Dec 16 2021, 10:56 PM · System administration
vsellier committed rSPSITE6513958f823c: logstash: fix the regexp to extract the name of the index to reopen from the… (authored by vsellier).
logstash: fix the regexp to extract the name of the index to reopen from the…
Dec 16 2021, 2:37 PM
vsellier accepted D6852: grafana: proxy /api/live/ through using mod_proxy_websocket.

thanks

Dec 16 2021, 12:00 PM
vsellier accepted D6851: Deploy and activate swh-worker@loader_cvs on staging workers.
Dec 16 2021, 11:37 AM