Page MenuHomeSoftware Heritage
Feed Advanced Search

May 18 2022

vsellier closed D7846: [swh-graphql] Prepare the docker image generation.
May 18 2022, 1:34 PM
vsellier committed R259:21bc668b25b6: Add missing swh-graphql dependencies (authored by vsellier).
Add missing swh-graphql dependencies
May 18 2022, 1:34 PM
vsellier accepted D7847: scripts: Extract reusable functions to a common module.
May 18 2022, 12:06 PM
vsellier added inline comments to D7846: [swh-graphql] Prepare the docker image generation.
May 18 2022, 11:46 AM
vsellier added inline comments to D7846: [swh-graphql] Prepare the docker image generation.
May 18 2022, 11:39 AM
vsellier requested review of D7846: [swh-graphql] Prepare the docker image generation.
May 18 2022, 11:37 AM
vsellier added a revision to T4134: Package the graphql service: D7846: [swh-graphql] Prepare the docker image generation.
May 18 2022, 11:37 AM · System administration, GraphQL API
vsellier closed D7815: Add swh-graphql application.
May 18 2022, 10:08 AM
vsellier committed R259:0a7b83c31959: Add swh-graphql application (authored by vsellier).
Add swh-graphql application
May 18 2022, 10:08 AM

May 17 2022

vsellier updated the summary of D7764: icinga checks: Activate the prometheus export on e2e tests.
May 17 2022, 8:42 PM
vsellier updated the diff for D7764: icinga checks: Activate the prometheus export on e2e tests.
  • rebase
  • move the parameters after 'icinga_plugins'
  • add an environment parameter
May 17 2022, 8:42 PM
vsellier requested review of D7844: Allow icinga checks to write prometheus metrics files.
May 17 2022, 8:24 PM
vsellier added a revision to T3129: Reliable monitoring of services: for users and for admins : D7844: Allow icinga checks to write prometheus metrics files.
May 17 2022, 8:24 PM · Roadmap 2022, Roadmap 2021, Monitoring, meta-task
vsellier reopened T4251: [swh-search] Investigate long search queries response time as "Work in Progress".
May 17 2022, 5:39 PM · System administration, Archive search
vsellier edited projects for T4251: [swh-search] Investigate long search queries response time, added: System administration; removed System administrators.
May 17 2022, 5:39 PM · System administration, Archive search
vsellier renamed T4251: [swh-search] Investigate long search queries response time from Improve search performance to [swh-search] Investigate long search queries response time.
May 17 2022, 5:38 PM · System administration, Archive search
vsellier added a comment to T4251: [swh-search] Investigate long search queries response time.

Here are the results of the queries.
You can directly paste the json in the search profiler to see the result.
(Be careful some are quite huge)

May 17 2022, 5:37 PM · System administration, Archive search
vsellier triaged T4251: [swh-search] Investigate long search queries response time as Normal priority.
May 17 2022, 8:54 AM · System administration, Archive search

May 16 2022

vsellier committed rDSNIP6221523e4a2b: Add netbox csv for the ceph objstorage infra (authored by vsellier).
Add netbox csv for the ceph objstorage infra
May 16 2022, 7:24 PM
vsellier renamed T4247: journalbeat failed to start after reboot from journalbeat fails to start after reboot to journalbeat failed to start after reboot.
May 16 2022, 9:24 AM · System administration
vsellier renamed T4247: journalbeat failed to start after reboot from journalbeat fail to start after reboot to journalbeat fails to start after reboot.
May 16 2022, 9:24 AM · System administration
vsellier closed T4247: journalbeat failed to start after reboot as Resolved.
May 16 2022, 9:17 AM · System administration
vsellier added a comment to T4247: journalbeat failed to start after reboot.

the file /var/lib/journalbeat/registry looks corrupted:
on worker10.euwest:

root@worker10:/var/lib/journalbeat# cat registry 
<?xml version="1.0" encoding="utf-8"?>
<GoalState xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="goalstate10.xsd">
  <Version>2012-11-30</Version>
  <Incarnation>1</Incarnation>
  <Machine>
    <ExpectedState>Started</ExpectedState>
    <StopRolesDeadlineHint>3

on worker09.euwest:

root@worker09:/var/lib/journalbeat# cat registry 
update_time: 2022-05-16T07:11:29.680690647Z
journal_entries:
- path: LOCAL_SYSTEM_JOURNAL
  cursor: s=1b5676c17e22450b80579b9caf065703;i=659f65c;b=97b0842367c749299a4a12ec839f1c3b;m=5b66c4ba4c0;t=5df1bbb86b72f;x=8e43c09dfc1a706e
  realtime_timestamp: 1652685086832431
  monotonic_timestamp: 6281059083456
May 16 2022, 9:17 AM · System administration
vsellier changed the status of T4247: journalbeat failed to start after reboot from Open to Work in Progress.
May 16 2022, 9:13 AM · System administration

May 13 2022

vsellier accepted D7827: Initialize the success boolean early to avoid unbound exception.
May 13 2022, 10:19 AM
vsellier accepted D7826: Move the prometheus statsd exporter to use the upstream tarball.

LGTM, just 2 non-blocking questions inline

May 13 2022, 10:01 AM

May 12 2022

vsellier added a revision to T4131: Graphql service in staging: D7815: Add swh-graphql application.
May 12 2022, 10:16 AM · System administration, GraphQL API
vsellier requested review of D7815: Add swh-graphql application.
May 12 2022, 10:16 AM

May 11 2022

vsellier closed T4234: Add snyk credentials for the staging r/o object storage as Resolved.

credentials created following https://docs.softwareheritage.org/sysadm/mirror-operations/onboard.html#how-to-create-the-objstorage-credentials

May 11 2022, 3:53 PM · Mirror, System administration
vsellier committed rSPSITEe92105220d46: Allow snyk mirror to access the read-only staging objstorage (authored by vsellier).
Allow snyk mirror to access the read-only staging objstorage
May 11 2022, 3:46 PM
vsellier committed rSPPRIVCd9ca4aa8fc2e: Add snyk-stg-01 credentials (authored by vsellier).
Add snyk-stg-01 credentials
May 11 2022, 3:42 PM
vsellier changed the status of T4234: Add snyk credentials for the staging r/o object storage from Open to Work in Progress.
May 11 2022, 3:40 PM · Mirror, System administration
vsellier changed the status of T4230: [provenance] pip deploy the stack on the ovh server from Open to Work in Progress.
May 11 2022, 10:37 AM · Provenance database, System administration

May 10 2022

vsellier closed D7807: Remove the wrong dependency added in the previous commit.
May 10 2022, 6:11 PM
vsellier committed rDICP11f9eae84707: Remove the wrong dependency added in the previous commit (authored by vsellier).
Remove the wrong dependency added in the previous commit
May 10 2022, 6:11 PM
vsellier added a revision to T3129: Reliable monitoring of services: for users and for admins : D7807: Remove the wrong dependency added in the previous commit.
May 10 2022, 6:08 PM · Roadmap 2022, Roadmap 2021, Monitoring, meta-task
vsellier accepted D7803: Ensure the loader/lister packages are installed on scheduler node.

great, thanks

May 10 2022, 4:01 PM
vsellier committed rDICP13fc2e4cd327: d/changelog: Bump new release (authored by vsellier).
d/changelog: Bump new release
May 10 2022, 2:21 PM
vsellier committed rDICPb0a683a07b41: Declare the prometheus client dependency (authored by vsellier).
Declare the prometheus client dependency
May 10 2022, 2:21 PM
vsellier accepted D7802: Deploy bzr loader in production.
May 10 2022, 2:20 PM
vsellier closed D6926: First iteration of prometheus export of the e2e metrics.
May 10 2022, 9:16 AM
vsellier committed rDICP9812ac8f7b1d: First iteration of prometheus export of the e2e metrics (authored by vsellier).
First iteration of prometheus export of the e2e metrics
May 10 2022, 9:16 AM
vsellier added inline comments to D6926: First iteration of prometheus export of the e2e metrics.
May 10 2022, 8:54 AM
vsellier updated the diff for D6926: First iteration of prometheus export of the e2e metrics.
  • rebase
  • update according the review feedbacks
May 10 2022, 8:52 AM

May 9 2022

vsellier closed D7793: logstash: Fix the closed index's error filtering.
May 9 2022, 4:01 PM
vsellier committed rSPSITE2f198d27fd56: logstash: Fix the closed index's error filtering (authored by vsellier).
logstash: Fix the closed index's error filtering
May 9 2022, 4:01 PM
vsellier requested review of D7793: logstash: Fix the closed index's error filtering.
May 9 2022, 3:57 PM

May 6 2022

vsellier retitled D6926: First iteration of prometheus export of the e2e metrics from WIP - First iteration of prometheus export of the e2e metrics to First iteration of prometheus export of the e2e metrics.
May 6 2022, 5:30 PM
vsellier requested review of D7764: icinga checks: Activate the prometheus export on e2e tests.
May 6 2022, 5:29 PM
vsellier closed T4221: Create a kubernetes cluster for the deployment experiment as Resolved.

The cluster is declared and the node provisionning.

May 6 2022, 10:07 AM · System administration
vsellier committed rSPREd62700b98e98: Refresh staging.tfstate after the new kubernetes cluster creation (authored by vsellier).
Refresh staging.tfstate after the new kubernetes cluster creation
May 6 2022, 10:06 AM
vsellier committed rSPRE0ff550d0f889: Declare the rancher cluster for the deployment's internship (authored by vsellier).
Declare the rancher cluster for the deployment's internship
May 6 2022, 10:06 AM
vsellier committed rSPRE529a2d63d236: refresh elastic-workers cluster after accidentally elastic-worker0 removal (authored by vsellier).
refresh elastic-workers cluster after accidentally elastic-worker0 removal
May 6 2022, 10:06 AM
vsellier committed rSPREd43e6bb05d85: Ensure cloud-init is not running before starting puppet (authored by vsellier).
Ensure cloud-init is not running before starting puppet
May 6 2022, 10:06 AM
vsellier closed D7740: Declare the rancher cluster for the deployment's internship.
May 6 2022, 10:06 AM
vsellier committed rSPRE062ec99d2101: Align worker0 and worker1 qemu arguments to match the real vms configuration (authored by vsellier).
Align worker0 and worker1 qemu arguments to match the real vms configuration
May 6 2022, 10:06 AM
vsellier updated the diff for D7740: Declare the rancher cluster for the deployment's internship.
  • fix the cloud-init / puppet concurrency after the vms startup
  • remove the wrong vmid assigned to the new cluster nodes
  • refresh the staging.tfstate file after applying the new configuration
May 6 2022, 10:05 AM

May 5 2022

vsellier committed rSPSITEb8ef7d1655b1: Declare the vagrant ips of the recently created nodes (authored by vsellier).
Declare the vagrant ips of the recently created nodes
May 5 2022, 4:19 PM
vsellier committed rSPSITEf58f988f25cc: Declare a generic rancher-node-.* hostname matching (authored by vsellier).
Declare a generic rancher-node-.* hostname matching
May 5 2022, 4:19 PM
vsellier closed D7739: Declare a generic rancher-node-.* hostname matching.
May 5 2022, 4:19 PM
vsellier committed rSPSITEd4d32f417106: kubernetes: Declare a generic rancher_node profile (authored by vsellier).
kubernetes: Declare a generic rancher_node profile
May 5 2022, 4:19 PM

May 4 2022

vsellier updated the diff for D7740: Declare the rancher cluster for the deployment's internship.

Align worker0 and worker1 qemu arguments to match the real vms configuration

May 4 2022, 5:58 PM
vsellier updated the test plan for D7740: Declare the rancher cluster for the deployment's internship.
May 4 2022, 5:09 PM
vsellier updated the diff for D7740: Declare the rancher cluster for the deployment's internship.
  • fix wrong references to the elastic worker cluster
  • rename nodes from rancher-node-internX to rancher-node-internshipX
May 4 2022, 5:08 PM
vsellier requested review of D7740: Declare the rancher cluster for the deployment's internship.
May 4 2022, 4:58 PM
vsellier added a revision to T4221: Create a kubernetes cluster for the deployment experiment: D7740: Declare the rancher cluster for the deployment's internship.
May 4 2022, 4:58 PM · System administration
vsellier requested review of D7739: Declare a generic rancher-node-.* hostname matching.
May 4 2022, 4:07 PM
vsellier added a revision to T4221: Create a kubernetes cluster for the deployment experiment: D7739: Declare a generic rancher-node-.* hostname matching.
May 4 2022, 4:07 PM · System administration
vsellier committed rSENV38d0155f3f9a: Declare a rancher node to test the site.pp hostname matching (authored by vsellier).
Declare a rancher node to test the site.pp hostname matching
May 4 2022, 3:37 PM
vsellier closed T4222: backup01 disk is not large enough to handle the dali backups as Resolved.

The backup is in sync, everything is back to normal

May 4 2022, 2:27 PM · System administration
vsellier added a revision to T4132: Add the graphql service in the docker environment: D7734: Add GraphQL service to SWH environment.
May 4 2022, 12:43 PM · System administration, GraphQL API
vsellier added a task to D7734: Add GraphQL service to SWH environment: T4132: Add the graphql service in the docker environment.
May 4 2022, 12:43 PM
vsellier accepted D7734: Add GraphQL service to SWH environment.

LGTM thanks

May 4 2022, 12:40 PM
vsellier committed rSPRE1697be7f5e9e: Increase size of the backup01 data disk (authored by vsellier).
Increase size of the backup01 data disk
May 4 2022, 12:28 PM
vsellier added a comment to T4222: backup01 disk is not large enough to handle the dali backups.

Now let's restart the synchronization:

root@backup01:~# zfs destroy -r data/sync/dali/postgresql
root@backup01:~# systemctl reset-failed syncoid-dali-postgresql.service
root@backup01:~# systemctl restart syncoid-dali-postgresql.service

and the same for data/sync/dali/postgresql_wall

May 4 2022, 12:08 PM · System administration
vsellier added a comment to T4222: backup01 disk is not large enough to handle the dali backups.

The disk needed to be detached and reattached in order to be resized.
It seems zfs didn't detect the pool after the reboot.
A reimport did the work:

root@backup01:~# zpool import data
root@backup01:~# zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
data   255G   188G  66.9G        -         -     9%    73%  1.00x    ONLINE  -

(The pool is still correctly detected after a reboot)

May 4 2022, 12:01 PM · System administration
vsellier moved T4222: backup01 disk is not large enough to handle the dali backups from Backlog to in-progress on the System administration board.
May 4 2022, 11:28 AM · System administration
vsellier added a comment to T4222: backup01 disk is not large enough to handle the dali backups.

The disk is sized at 200G, according to the azure portal, it can be resized to 256 without any additional cost.

Enter the size of the disk you would like to create. You will be charged the same rate for your provisioned disk, regardless of how much of the disk space is being used For example, a 200 GiB disk is provisioned on a 256 GiB disk, so you would be billed for the 256 GiB provisioned.

May 4 2022, 11:28 AM · System administration
vsellier changed the status of T4222: backup01 disk is not large enough to handle the dali backups from Open to Work in Progress.
May 4 2022, 11:25 AM · System administration
vsellier requested changes to D7734: Add GraphQL service to SWH environment.
May 4 2022, 10:31 AM

May 3 2022

vsellier moved T4221: Create a kubernetes cluster for the deployment experiment from Backlog to in-progress on the System administration board.
May 3 2022, 5:27 PM · System administration
vsellier changed the status of T4221: Create a kubernetes cluster for the deployment experiment from Open to Work in Progress.
May 3 2022, 5:27 PM · System administration
vsellier committed rSENVdf19c80df544: Add new elastics workers facts (authored by vsellier).
Add new elastics workers facts
May 3 2022, 5:22 PM
vsellier changed the status of T4134: Package the graphql service, a subtask of T4131: Graphql service in staging, from Open to Work in Progress.
May 3 2022, 10:41 AM · System administration, GraphQL API
vsellier changed the status of T4134: Package the graphql service from Open to Work in Progress.
May 3 2022, 10:41 AM · System administration, GraphQL API
vsellier updated the diff for D6926: First iteration of prometheus export of the e2e metrics.
  • rebase
  • test the prometheus exporter file creation
May 3 2022, 10:28 AM

May 2 2022

vsellier added a member for Reviewers: amadouth6.
May 2 2022, 10:08 AM
vsellier added a member for Staff: amadouth6.
May 2 2022, 10:08 AM
vsellier added a member for Developers: amadouth6.
May 2 2022, 10:08 AM

Apr 21 2022

vsellier added a comment to T3592: POC elastic worker infrastructure.

Good news, it looks like there is no more issues with the inter-node communication with rancher 2.6.4 and bullseye.

Apr 21 2022, 8:04 PM · System administration
vsellier accepted D7607: Create elastic worker node up to rancher cluster registration.

gg ;)

Apr 21 2022, 7:39 PM
vsellier renamed T4191: Hedgedoc: websocket upgrade queries routed to grafana from Headgedoc: websocket upgrade queries routed to grafana to Hedgedoc: websocket upgrade queries routed to grafana.
Apr 21 2022, 7:24 PM · System administration
vsellier triaged T4191: Hedgedoc: websocket upgrade queries routed to grafana as High priority.
Apr 21 2022, 7:22 PM · System administration
vsellier accepted D7624: proxmox/templates: Upgrade bullseye vm to latest 11.3.
Apr 21 2022, 4:41 PM
vsellier accepted D7625: proxmox/templates: Build extra vm template with zfs tools ready.
Apr 21 2022, 4:17 PM

Apr 20 2022

vsellier updated the diff for D6926: First iteration of prometheus export of the e2e metrics.

Rebase

Apr 20 2022, 7:09 PM
vsellier accepted D7606: rancher: Bootstrap rancher declarations.
Apr 20 2022, 5:59 PM
vsellier accepted D7600: Bootstrap elastic workers role.

Thanks, we will look later how to have a better zpool initialization

Apr 20 2022, 5:48 PM
vsellier closed T3671: staging - swh-search (metadata indexer) is unable to update a document due to an unparseable date as Resolved.

There is no occurrences of this error in the logs and the consumers don't have any lag, so yes, I guess it is.

Apr 20 2022, 5:19 PM · Intrinsic metadata, Archive search
vsellier closed D7602: opam: avoid the repo management script to alway try to initialize the opam repositories.
Apr 20 2022, 4:20 PM