Page MenuHomeSoftware Heritage
Feed Advanced Search

May 31 2022

vsellier closed T4278: Elastic worker cluster failures to unstuck, a subtask of T4144: Elastic worker infrastructure, as Resolved.
May 31 2022, 3:05 PM · meta-task, System administration, Roadmap 2022
vsellier added a comment to T4278: Elastic worker cluster failures to unstuck.

Unfortunately, after several tries, we were unable to restart the cluster due to a problem with the etcd leader election / data on the nodes (probably wrong manipulation from us).
We finally destroyed the cluster (we had to follow [1] because the cluster was in an unstable state and rancher refused to remove it)

May 31 2022, 2:34 PM · System administration, Roadmap 2022

May 30 2022

vsellier accepted D7900: Update manifest to exclude swh/provenance/tools.

I tried to fix it too but without success, so I guess we can go with this diff until we find a better solution

May 30 2022, 2:33 PM

May 24 2022

vsellier closed T4268: Downgrade internship cluster to 1.21 as Resolved.
May 24 2022, 9:40 AM · System administration

May 23 2022

vsellier closed D7886: rancher: downgrade deployment-internship cluster to 1.21.
May 23 2022, 7:38 PM
vsellier committed rSPRE6e7b13218cbc: rancher: downgrade deployment-internship cluster to 1.21 (authored by vsellier).
rancher: downgrade deployment-internship cluster to 1.21
May 23 2022, 7:38 PM
vsellier requested review of D7886: rancher: downgrade deployment-internship cluster to 1.21.
May 23 2022, 4:51 PM
vsellier added a revision to T4268: Downgrade internship cluster to 1.21: D7886: rancher: downgrade deployment-internship cluster to 1.21.
May 23 2022, 4:51 PM · System administration
vsellier added a comment to T4268: Downgrade internship cluster to 1.21.
May 23 2022, 4:36 PM · System administration
vsellier updated the task description for T4265: Create test topics on staging's kafka for the mirror tests.
May 23 2022, 3:27 PM · System administration
vsellier added a comment to T4265: Create test topics on staging's kafka for the mirror tests.

thanks, I forgot to mention that

May 23 2022, 3:27 PM · System administration
vsellier added a comment to T4268: Downgrade internship cluster to 1.21.

The error when we try to declare a resource in 1.22:

May 23 2022, 2:40 PM · System administration
vsellier changed the status of T4268: Downgrade internship cluster to 1.21 from Open to Work in Progress.
May 23 2022, 2:38 PM · System administration
vsellier triaged T4265: Create test topics on staging's kafka for the mirror tests as Normal priority.
May 23 2022, 1:48 PM · System administration
vsellier closed D7864: icinga: Don't monitor kubernetes local mounts.
May 23 2022, 10:57 AM
vsellier committed rSPSITEfcb8f4f7f3d6: icinga: Don't monitor kubernetes local mounts (authored by vsellier).
icinga: Don't monitor kubernetes local mounts
May 23 2022, 10:57 AM
vsellier closed T4258: [add forge now] email inbound not catched by django as Resolved.
May 23 2022, 9:54 AM · System administration, Add Forge Now

May 19 2022

vsellier requested review of D7864: icinga: Don't monitor kubernetes local mounts.
May 19 2022, 6:46 PM
vsellier accepted D7856: docs: Fix typos and clean up whitespace.
May 19 2022, 4:32 PM
vsellier accepted D7850: Add script to mirror an upstream repository to our forge.
May 19 2022, 4:31 PM
vsellier closed D7764: icinga checks: Activate the prometheus export on e2e tests.
May 19 2022, 2:43 PM
vsellier committed rSPSITEa7c2fc6b65a1: icinga checks: Activate the prometheus export on e2e tests (authored by vsellier).
icinga checks: Activate the prometheus export on e2e tests
May 19 2022, 2:43 PM
vsellier updated the diff for D7764: icinga checks: Activate the prometheus export on e2e tests.

rebase

May 19 2022, 2:18 PM
vsellier closed D7844: Allow icinga checks to write prometheus metrics files.
May 19 2022, 2:09 PM
vsellier committed rSPSITE8c8590ef6fad: Allow icinga checks to write prometheus metrics files (authored by vsellier).
Allow icinga checks to write prometheus metrics files
May 19 2022, 2:09 PM
vsellier updated the diff for D7844: Allow icinga checks to write prometheus metrics files.

rebase

May 19 2022, 2:08 PM

May 18 2022

vsellier added inline comments to D7844: Allow icinga checks to write prometheus metrics files.
May 18 2022, 2:35 PM
vsellier closed D7849: webapp: fix the inbound email command name.
May 18 2022, 2:14 PM
vsellier committed rSPSITEd929d03b485d: webapp: fix the inbound email command name (authored by vsellier).
webapp: fix the inbound email command name
May 18 2022, 2:14 PM
vsellier updated the summary of D7849: webapp: fix the inbound email command name.
May 18 2022, 2:12 PM
vsellier updated the diff for D7849: webapp: fix the inbound email command name.

edit commit message

May 18 2022, 2:12 PM
vsellier requested review of D7849: webapp: fix the inbound email command name.
May 18 2022, 2:11 PM
vsellier added a revision to T4258: [add forge now] email inbound not catched by django: D7849: webapp: fix the inbound email command name.
May 18 2022, 2:11 PM · System administration, Add Forge Now
vsellier added projects to T4258: [add forge now] email inbound not catched by django: Add Forge Now , System administration.
May 18 2022, 2:08 PM · System administration, Add Forge Now
vsellier changed the status of T4258: [add forge now] email inbound not catched by django from Open to Work in Progress.

The command name called by postfix[1] seems to not match the command name declared in the webapp[2]

May 18 2022, 2:08 PM · System administration, Add Forge Now
vsellier committed R259:0f0cf6c8874d: Add the Dockerfile for the swh-graphql application (authored by vsellier).
Add the Dockerfile for the swh-graphql application
May 18 2022, 1:34 PM
vsellier closed D7846: [swh-graphql] Prepare the docker image generation.
May 18 2022, 1:34 PM
vsellier committed R259:21bc668b25b6: Add missing swh-graphql dependencies (authored by vsellier).
Add missing swh-graphql dependencies
May 18 2022, 1:34 PM
vsellier accepted D7847: scripts: Extract reusable functions to a common module.
May 18 2022, 12:06 PM
vsellier added inline comments to D7846: [swh-graphql] Prepare the docker image generation.
May 18 2022, 11:46 AM
vsellier added inline comments to D7846: [swh-graphql] Prepare the docker image generation.
May 18 2022, 11:39 AM
vsellier requested review of D7846: [swh-graphql] Prepare the docker image generation.
May 18 2022, 11:37 AM
vsellier added a revision to T4134: Package the graphql service: D7846: [swh-graphql] Prepare the docker image generation.
May 18 2022, 11:37 AM · System administration, GraphQL API
vsellier closed D7815: Add swh-graphql application.
May 18 2022, 10:08 AM
vsellier committed R259:0a7b83c31959: Add swh-graphql application (authored by vsellier).
Add swh-graphql application
May 18 2022, 10:08 AM

May 17 2022

vsellier updated the summary of D7764: icinga checks: Activate the prometheus export on e2e tests.
May 17 2022, 8:42 PM
vsellier updated the diff for D7764: icinga checks: Activate the prometheus export on e2e tests.
  • rebase
  • move the parameters after 'icinga_plugins'
  • add an environment parameter
May 17 2022, 8:42 PM
vsellier requested review of D7844: Allow icinga checks to write prometheus metrics files.
May 17 2022, 8:24 PM
vsellier added a revision to T3129: Reliable monitoring of services: for users and for admins : D7844: Allow icinga checks to write prometheus metrics files.
May 17 2022, 8:24 PM · Roadmap 2022, Roadmap 2021, Monitoring, meta-task
vsellier reopened T4251: [swh-search] Investigate long search queries response time as "Work in Progress".
May 17 2022, 5:39 PM · System administration, Archive search
vsellier edited projects for T4251: [swh-search] Investigate long search queries response time, added: System administration; removed System administrators.
May 17 2022, 5:39 PM · System administration, Archive search
vsellier renamed T4251: [swh-search] Investigate long search queries response time from Improve search performance to [swh-search] Investigate long search queries response time.
May 17 2022, 5:38 PM · System administration, Archive search
vsellier added a comment to T4251: [swh-search] Investigate long search queries response time.

Here are the results of the queries.
You can directly paste the json in the search profiler to see the result.
(Be careful some are quite huge)

May 17 2022, 5:37 PM · System administration, Archive search
vsellier triaged T4251: [swh-search] Investigate long search queries response time as Normal priority.
May 17 2022, 8:54 AM · System administration, Archive search

May 16 2022

vsellier committed rDSNIP6221523e4a2b: Add netbox csv for the ceph objstorage infra (authored by vsellier).
Add netbox csv for the ceph objstorage infra
May 16 2022, 7:24 PM
vsellier renamed T4247: journalbeat failed to start after reboot from journalbeat fails to start after reboot to journalbeat failed to start after reboot.
May 16 2022, 9:24 AM · System administration
vsellier renamed T4247: journalbeat failed to start after reboot from journalbeat fail to start after reboot to journalbeat fails to start after reboot.
May 16 2022, 9:24 AM · System administration
vsellier closed T4247: journalbeat failed to start after reboot as Resolved.
May 16 2022, 9:17 AM · System administration
vsellier added a comment to T4247: journalbeat failed to start after reboot.

the file /var/lib/journalbeat/registry looks corrupted:
on worker10.euwest:

root@worker10:/var/lib/journalbeat# cat registry 
<?xml version="1.0" encoding="utf-8"?>
<GoalState xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="goalstate10.xsd">
  <Version>2012-11-30</Version>
  <Incarnation>1</Incarnation>
  <Machine>
    <ExpectedState>Started</ExpectedState>
    <StopRolesDeadlineHint>3

on worker09.euwest:

root@worker09:/var/lib/journalbeat# cat registry 
update_time: 2022-05-16T07:11:29.680690647Z
journal_entries:
- path: LOCAL_SYSTEM_JOURNAL
  cursor: s=1b5676c17e22450b80579b9caf065703;i=659f65c;b=97b0842367c749299a4a12ec839f1c3b;m=5b66c4ba4c0;t=5df1bbb86b72f;x=8e43c09dfc1a706e
  realtime_timestamp: 1652685086832431
  monotonic_timestamp: 6281059083456
May 16 2022, 9:17 AM · System administration
vsellier changed the status of T4247: journalbeat failed to start after reboot from Open to Work in Progress.
May 16 2022, 9:13 AM · System administration

May 13 2022

vsellier accepted D7827: Initialize the success boolean early to avoid unbound exception.
May 13 2022, 10:19 AM
vsellier accepted D7826: Move the prometheus statsd exporter to use the upstream tarball.

LGTM, just 2 non-blocking questions inline

May 13 2022, 10:01 AM

May 12 2022

vsellier added a revision to T4131: Graphql service in staging: D7815: Add swh-graphql application.
May 12 2022, 10:16 AM · System administration, GraphQL API
vsellier requested review of D7815: Add swh-graphql application.
May 12 2022, 10:16 AM

May 11 2022

vsellier closed T4234: Add snyk credentials for the staging r/o object storage as Resolved.

credentials created following https://docs.softwareheritage.org/sysadm/mirror-operations/onboard.html#how-to-create-the-objstorage-credentials

May 11 2022, 3:53 PM · Mirror, System administration
vsellier committed rSPSITEe92105220d46: Allow snyk mirror to access the read-only staging objstorage (authored by vsellier).
Allow snyk mirror to access the read-only staging objstorage
May 11 2022, 3:46 PM
vsellier committed rSPPRIVCd9ca4aa8fc2e: Add snyk-stg-01 credentials (authored by vsellier).
Add snyk-stg-01 credentials
May 11 2022, 3:42 PM
vsellier changed the status of T4234: Add snyk credentials for the staging r/o object storage from Open to Work in Progress.
May 11 2022, 3:40 PM · Mirror, System administration
vsellier changed the status of T4230: [provenance] pip deploy the stack on the ovh server from Open to Work in Progress.
May 11 2022, 10:37 AM · Provenance database, System administration

May 10 2022

vsellier closed D7807: Remove the wrong dependency added in the previous commit.
May 10 2022, 6:11 PM
vsellier committed rDICP11f9eae84707: Remove the wrong dependency added in the previous commit (authored by vsellier).
Remove the wrong dependency added in the previous commit
May 10 2022, 6:11 PM
vsellier added a revision to T3129: Reliable monitoring of services: for users and for admins : D7807: Remove the wrong dependency added in the previous commit.
May 10 2022, 6:08 PM · Roadmap 2022, Roadmap 2021, Monitoring, meta-task
vsellier accepted D7803: Ensure the loader/lister packages are installed on scheduler node.

great, thanks

May 10 2022, 4:01 PM
vsellier committed rDICP13fc2e4cd327: d/changelog: Bump new release (authored by vsellier).
d/changelog: Bump new release
May 10 2022, 2:21 PM
vsellier committed rDICPb0a683a07b41: Declare the prometheus client dependency (authored by vsellier).
Declare the prometheus client dependency
May 10 2022, 2:21 PM
vsellier accepted D7802: Deploy bzr loader in production.
May 10 2022, 2:20 PM
vsellier closed D6926: First iteration of prometheus export of the e2e metrics.
May 10 2022, 9:16 AM
vsellier committed rDICP9812ac8f7b1d: First iteration of prometheus export of the e2e metrics (authored by vsellier).
First iteration of prometheus export of the e2e metrics
May 10 2022, 9:16 AM
vsellier added inline comments to D6926: First iteration of prometheus export of the e2e metrics.
May 10 2022, 8:54 AM
vsellier updated the diff for D6926: First iteration of prometheus export of the e2e metrics.
  • rebase
  • update according the review feedbacks
May 10 2022, 8:52 AM

May 9 2022

vsellier closed D7793: logstash: Fix the closed index's error filtering.
May 9 2022, 4:01 PM
vsellier committed rSPSITE2f198d27fd56: logstash: Fix the closed index's error filtering (authored by vsellier).
logstash: Fix the closed index's error filtering
May 9 2022, 4:01 PM
vsellier requested review of D7793: logstash: Fix the closed index's error filtering.
May 9 2022, 3:57 PM

May 6 2022

vsellier retitled D6926: First iteration of prometheus export of the e2e metrics from WIP - First iteration of prometheus export of the e2e metrics to First iteration of prometheus export of the e2e metrics.
May 6 2022, 5:30 PM
vsellier requested review of D7764: icinga checks: Activate the prometheus export on e2e tests.
May 6 2022, 5:29 PM
vsellier closed T4221: Create a kubernetes cluster for the deployment experiment as Resolved.

The cluster is declared and the node provisionning.

May 6 2022, 10:07 AM · System administration
vsellier committed rSPREd62700b98e98: Refresh staging.tfstate after the new kubernetes cluster creation (authored by vsellier).
Refresh staging.tfstate after the new kubernetes cluster creation
May 6 2022, 10:06 AM
vsellier committed rSPRE0ff550d0f889: Declare the rancher cluster for the deployment's internship (authored by vsellier).
Declare the rancher cluster for the deployment's internship
May 6 2022, 10:06 AM
vsellier committed rSPRE529a2d63d236: refresh elastic-workers cluster after accidentally elastic-worker0 removal (authored by vsellier).
refresh elastic-workers cluster after accidentally elastic-worker0 removal
May 6 2022, 10:06 AM
vsellier committed rSPREd43e6bb05d85: Ensure cloud-init is not running before starting puppet (authored by vsellier).
Ensure cloud-init is not running before starting puppet
May 6 2022, 10:06 AM
vsellier closed D7740: Declare the rancher cluster for the deployment's internship.
May 6 2022, 10:06 AM
vsellier committed rSPRE062ec99d2101: Align worker0 and worker1 qemu arguments to match the real vms configuration (authored by vsellier).
Align worker0 and worker1 qemu arguments to match the real vms configuration
May 6 2022, 10:06 AM
vsellier updated the diff for D7740: Declare the rancher cluster for the deployment's internship.
  • fix the cloud-init / puppet concurrency after the vms startup
  • remove the wrong vmid assigned to the new cluster nodes
  • refresh the staging.tfstate file after applying the new configuration
May 6 2022, 10:05 AM

May 5 2022

vsellier committed rSPSITEb8ef7d1655b1: Declare the vagrant ips of the recently created nodes (authored by vsellier).
Declare the vagrant ips of the recently created nodes
May 5 2022, 4:19 PM
vsellier committed rSPSITEf58f988f25cc: Declare a generic rancher-node-.* hostname matching (authored by vsellier).
Declare a generic rancher-node-.* hostname matching
May 5 2022, 4:19 PM
vsellier closed D7739: Declare a generic rancher-node-.* hostname matching.
May 5 2022, 4:19 PM
vsellier committed rSPSITEd4d32f417106: kubernetes: Declare a generic rancher_node profile (authored by vsellier).
kubernetes: Declare a generic rancher_node profile
May 5 2022, 4:19 PM

May 4 2022

vsellier updated the diff for D7740: Declare the rancher cluster for the deployment's internship.

Align worker0 and worker1 qemu arguments to match the real vms configuration

May 4 2022, 5:58 PM
vsellier updated the test plan for D7740: Declare the rancher cluster for the deployment's internship.
May 4 2022, 5:09 PM
vsellier updated the diff for D7740: Declare the rancher cluster for the deployment's internship.
  • fix wrong references to the elastic worker cluster
  • rename nodes from rancher-node-internX to rancher-node-internshipX
May 4 2022, 5:08 PM