Page MenuHomeSoftware Heritage
Feed Advanced Search

May 10 2021

vsellier closed T3222: Monitor daily indexes are present on the log cluster and logs are correctly ingested as Resolved.

The check is now active.
An alert will be raised by icinga if :

  • logstash is not responding to the api call
  • at least one error is detected when the logs are sent to elasticseach (ES responding, but an error is detected when the log is stored on the index).
May 10 2021, 3:00 PM · System administration
vsellier closed T3222: Monitor daily indexes are present on the log cluster and logs are correctly ingested, a subtask of T3219: No logs are ingested on elasticsearch since 2021-03-26, as Resolved.
May 10 2021, 3:00 PM · System administrators
vsellier committed rSPSITE12982f3dcbf9: Declare the icinga check command also on the master (authored by vsellier).
Declare the icinga check command also on the master
May 10 2021, 2:53 PM
vsellier closed D5718: Concatenate global and agent plugins list.
May 10 2021, 2:11 PM
vsellier committed rSPSITE39dd72bd4820: Concatenate global and agent plugins list (authored by vsellier).
Concatenate global and agent plugins list
May 10 2021, 2:11 PM
vsellier updated the summary of D5718: Concatenate global and agent plugins list.
May 10 2021, 12:07 PM
vsellier updated the diff for D5718: Concatenate global and agent plugins list.

update the commit message

May 10 2021, 12:07 PM
vsellier requested review of D5718: Concatenate global and agent plugins list.
May 10 2021, 12:06 PM
vsellier added a revision to T3222: Monitor daily indexes are present on the log cluster and logs are correctly ingested: D5718: Concatenate global and agent plugins list.
May 10 2021, 12:06 PM · System administration
vsellier closed D5716: monitoring: activate the logstash probe via a filter on the plugins.
May 10 2021, 11:27 AM
vsellier committed rSPSITE4dd46189457a: monitoring: activate the logstash probe via a filter on the plugins (authored by vsellier).
monitoring: activate the logstash probe via a filter on the plugins
May 10 2021, 11:27 AM
vsellier updated the diff for D5716: monitoring: activate the logstash probe via a filter on the plugins.

fix indentation

May 10 2021, 11:26 AM

May 8 2021

vsellier requested review of D5716: monitoring: activate the logstash probe via a filter on the plugins.
May 8 2021, 3:42 PM
vsellier added a revision to T3222: Monitor daily indexes are present on the log cluster and logs are correctly ingested: D5716: monitoring: activate the logstash probe via a filter on the plugins.
May 8 2021, 3:42 PM · System administration

May 7 2021

vsellier added a comment to T3222: Monitor daily indexes are present on the log cluster and logs are correctly ingested.

According to the API (TIL the catalog can be requested like that), journal0 doesn't have the new plugins declared. So the check should be disabled as the filter is using this field

May 7 2021, 5:20 PM · System administration
vsellier accepted D5713: Deploy new sourceforge lister task.
May 7 2021, 3:48 PM
vsellier added a comment to T3222: Monitor daily indexes are present on the log cluster and logs are correctly ingested.

The new probe is deployed but nothing is displayed in icinga. Let's start a configuration debug session.

May 7 2021, 3:20 PM · System administration
vsellier closed D5709: Add a monitoring alert when logstash is failing to send logs to ES.
May 7 2021, 2:54 PM
vsellier committed rSPSITE6657bf88897d: Add a monitoring alert when logstash is failing to send logs to ES (authored by vsellier).
Add a monitoring alert when logstash is failing to send logs to ES
May 7 2021, 2:54 PM
vsellier closed T3203: docs: Document the firewall installation and procedures, a subtask of T3194: Upgrade opnsense firewalls from 20.7.4 to 21.1.4, as Resolved.
May 7 2021, 2:53 PM · System administration
vsellier closed T3203: docs: Document the firewall installation and procedures as Resolved.
May 7 2021, 2:52 PM · Documentation, System administration
vsellier closed D5692: document the firewalls upgrade procedure.
May 7 2021, 2:50 PM
vsellier committed rDDOC7147b9c0cac2: document the firewalls upgrade procedure (authored by vsellier).
document the firewalls upgrade procedure
May 7 2021, 2:50 PM
vsellier updated the diff for D5709: Add a monitoring alert when logstash is failing to send logs to ES.
  • fix inconsistency in check command naming
  • remove the unecessary set option on the check script
May 7 2021, 2:45 PM
vsellier added inline comments to D5709: Add a monitoring alert when logstash is failing to send logs to ES.
May 7 2021, 2:40 PM
vsellier updated the diff for D5692: document the firewalls upgrade procedure.

network schema: reactivate the description of the firewalls' group

May 7 2021, 2:19 PM
vsellier updated the diff for D5692: document the firewalls upgrade procedure.
  • rebase
  • network schema:
    • change the vlan order to be able to use only one gateway
    • Adapt several labels
  • rst: adatp according to the feebacks
May 7 2021, 2:15 PM
vsellier added inline comments to D5692: document the firewalls upgrade procedure.
May 7 2021, 2:04 PM
vsellier added inline comments to D5692: document the firewalls upgrade procedure.
May 7 2021, 2:02 PM
vsellier requested review of D5709: Add a monitoring alert when logstash is failing to send logs to ES.
May 7 2021, 12:09 PM
vsellier added a revision to T3222: Monitor daily indexes are present on the log cluster and logs are correctly ingested: D5709: Add a monitoring alert when logstash is failing to send logs to ES.
May 7 2021, 12:09 PM · System administration
vsellier added a project to T3313: Web API: per-user accounting: System administration.
May 7 2021, 9:54 AM · System administration, Web app

May 6 2021

vsellier added a comment to T3222: Monitor daily indexes are present on the log cluster and logs are correctly ingested.

after searching how it can be integrated with the inciga checks, the simplest way I have found is to create a script that periodically query logstash to get the statistics and return this status in this cases:

  • GREEN: neither non_retryable_failures, with_errors or failures fields founds on the json
  • WARNING: failures field found
  • CRITICAL: non_retryable_failures or with_errors field found
May 6 2021, 4:09 PM · System administration
vsellier added a comment to T3222: Monitor daily indexes are present on the log cluster and logs are correctly ingested.

I have simulated different situations locally on the vagrant environment:

root@logstash0:~# curl -s http://localhost:9600/_node/stats/pipelines | jq '.pipelines.main.plugins.outputs'
[
  {
    "id": "c49a6902391a456022af4c89f0972781900d01d70cd5f312b292cb20c0d345eb",
    "documents": {
      "non_retryable_failures": 112,
      "successes": 103692
    },
    "events": {
      "out": 103804,
      "in": 103804,
      "duration_in_millis": 3529049
    },
    "name": "elasticsearch",
    "bulk_requests": {
      "responses": {
        "200": 2028
      },
      "failures": 3,
      "with_errors": 110,
      "successes": 1918
    }
  }
]
May 6 2021, 3:30 PM · System administration
vsellier updated the diff for D5692: document the firewalls upgrade procedure.
  • split procedure per firewall
  • clarify the initial status before each section
  • fix minor typos
May 6 2021, 12:17 PM
vsellier closed T3306: Upgrade the firewalls to version 21.5.1 as Resolved.

upgrade done without any problem:

  • CARP maintenance activated on pushkin -> glyptotek elected as primary
  • pushkin upgrade done
  • CARP maintenance deactivated on pushkin -> pushkin re-elected as primary
  • nothing wrong detected after a safety period of 1 hour
  • CARP maintenance mode activated on glyptotek to avoid an unexpected rebalance during the upgrade
  • glyptotek upgrade done
  • CARP maintenance mode deactivated on glyptotek
May 6 2021, 11:31 AM · System administration
vsellier planned changes to D5692: document the firewalls upgrade procedure.
May 6 2021, 11:15 AM
vsellier requested review of D5692: document the firewalls upgrade procedure.
May 6 2021, 11:12 AM
vsellier added a revision to T3203: docs: Document the firewall installation and procedures: D5692: document the firewalls upgrade procedure.
May 6 2021, 11:12 AM · Documentation, System administration
vsellier closed T3243: Replace /dev/sdb and /dev/sdc on storage1.staging as Resolved.

Actions performed:

  • wwn-0x5000c500d5de652a(sdb) : new -> spare
  • wwn-0x5000c500a22eed6f(sdh) : spare -> mirror
  • wwn-0x5000c500d5dda886(sdc) : new -> mirror
May 6 2021, 9:54 AM · System administration, Staging environment
vsellier closed T3243: Replace /dev/sdb and /dev/sdc on storage1.staging, a subtask of T3236: staging: Disk error on storage1, as Resolved.
May 6 2021, 9:54 AM · System administration, Staging environment
vsellier added a comment to T3243: Replace /dev/sdb and /dev/sdc on storage1.staging.

The checks ran without detecting any bad block on the disk.
They can be added on the zfs pool again.

May 6 2021, 9:33 AM · System administration, Staging environment

May 5 2021

vsellier added a comment to T3306: Upgrade the firewalls to version 21.5.1.

changelog for the 21.1.5 version

Good day everyone,
May 5 2021, 6:09 PM · System administration
vsellier closed T3307: apidoc generation is not performed on swh modules during dev build as Resolved.
May 5 2021, 2:52 PM · Continuous Integration, Documentation
vsellier closed D5681: Add support to not install the dev modules in editable mode.
May 5 2021, 2:32 PM
vsellier committed rDDOC5e6f5d854803: Don't install the modules in editable mode (authored by vsellier).
Don't install the modules in editable mode
May 5 2021, 2:32 PM
vsellier updated the diff for D5681: Add support to not install the dev modules in editable mode.

rebase

May 5 2021, 2:32 PM
vsellier added a comment to D5681: Add support to not install the dev modules in editable mode.

And I think that I found how to fix the last remaining warning cited above.

good news ;)
thanks

May 5 2021, 2:30 PM
vsellier updated the diff for D5681: Add support to not install the dev modules in editable mode.

completely remove the -e option

May 5 2021, 2:27 PM
vsellier added inline comments to D5681: Add support to not install the dev modules in editable mode.
May 5 2021, 2:21 PM
vsellier added a comment to T3307: apidoc generation is not performed on swh modules during dev build.

@anlambert I didn't succeed to work with link to the modules, but I found another way with a standard installation (i.e without the -e flag) we can force with a flag on the ci (cf D5681).
I really not sure if it's the good approach or not.

May 5 2021, 1:09 PM · Continuous Integration, Documentation
vsellier planned changes to D5681: Add support to not install the dev modules in editable mode.
May 5 2021, 1:07 PM
vsellier requested review of D5681: Add support to not install the dev modules in editable mode.
May 5 2021, 1:06 PM
vsellier added a revision to T3307: apidoc generation is not performed on swh modules during dev build: D5681: Add support to not install the dev modules in editable mode.
May 5 2021, 1:06 PM · Continuous Integration, Documentation
vsellier renamed T3307: apidoc generation is not performed on swh modules during dev build from swh dependencies not installed during dev build to apidoc generation is not performed on swh modules during dev build.
May 5 2021, 11:36 AM · Continuous Integration, Documentation
vsellier added a comment to T3243: Replace /dev/sdb and /dev/sdc on storage1.staging.

A full badblock test is launched on both disks:

root@storage1:~# badblocks -v -w -B -s -b 4096 /dev/sdb
root@storage1:~# badblocks -v -w -B -s -b 4096 /dev/sdc
May 5 2021, 11:33 AM · System administration, Staging environment
vsellier added a comment to T3307: apidoc generation is not performed on swh modules during dev build.

thanks a lot @anlambert, I will look in that direction

May 5 2021, 11:27 AM · Continuous Integration, Documentation
vsellier added a comment to T3307: apidoc generation is not performed on swh modules during dev build.

Aparrently, the modules are well installed but the sphinx-apidoc is not detecting them due to the local installation

jenkins@4e5220b923d8:~/workspace/swh-environment/swh-docs$ .tox/sphinx-dev/bin/python3 -m pip list | grep swh
swh.auth                      0.5.4                           /home/jenkins/workspace/swh-environment/swh-auth
swh.core                      0.13.2.dev1+g7d42035            /home/jenkins/workspace/swh-environment/swh-core
swh.counters                  0.7.1.dev1+g6a44a84             /home/jenkins/workspace/swh-environment/swh-counters
swh.deposit                   0.13.6                          /home/jenkins/workspace/swh-environment/swh-deposit
swh.docs                      0.0.1.dev334+g044cb9b.d20210505
swh.fuse                      1.0.3                           /home/jenkins/workspace/swh-environment/swh-fuse
swh.graph                     0.3.2.dev3+g62c2fd3             /home/jenkins/workspace/swh-environment/swh-graph
swh.icinga-plugins            0.3.1.dev1+g8878925             /home/jenkins/workspace/swh-environment/swh-icinga-plugins
swh.indexer                   0.7.1.dev4+g8f1fb0f             /home/jenkins/workspace/swh-environment/swh-indexer
swh.journal                   0.7.2.dev8+g2972c7a             /home/jenkins/workspace/swh-environment/swh-journal
swh.lister                    1.1.0                           /home/jenkins/workspace/swh-environment/swh-lister
swh.loader.core               0.22.1.dev2+g0e4bb4b            /home/jenkins/workspace/swh-environment/swh-loader-core
swh.loader.git                0.9.2.dev1+g15e12fa             /home/jenkins/workspace/swh-environment/swh-loader-git
swh.loader.mercurial          0.5.1.dev4+g8884714             /home/jenkins/workspace/swh-environment/swh-loader-mercurial
swh.loader.svn                0.7.1                           /home/jenkins/workspace/swh-environment/swh-loader-svn
swh.model                     2.4.2.dev1+gdf036ef             /home/jenkins/workspace/swh-environment/swh-model
swh.objstorage                0.2.3                           /home/jenkins/workspace/swh-environment/swh-objstorage
swh.objstorage.replayer       0.2.2                           /home/jenkins/workspace/swh-environment/swh-objstorage-replayer
swh.scanner                   0.4.2.dev1+g30b40cc             /home/jenkins/workspace/swh-environment/swh-scanner
swh.scheduler                 0.13.1.dev5+gbab557e            /home/jenkins/workspace/swh-environment/swh-scheduler
swh.search                    0.8.1                           /home/jenkins/workspace/swh-environment/swh-search
swh.storage                   0.27.5.dev4+g051b7715           /home/jenkins/workspace/swh-environment/swh-storage
swh.vault                     0.5.2.dev2+gf87dd54             /home/jenkins/workspace/swh-environment/swh-vault
swh.web                       0.0.307.dev3+gf8c750b6          /home/jenkins/workspace/swh-environment/swh-web
swh.web.client                0.3.1.dev1+g4b610ad             /home/jenkins/workspace/swh-environment/swh-web-client
sphinx-apidoc \
	--implicit-namespaces \
	--templatedir=../swh/docs/templates/ \
	--maxdepth=3 \
	--ext-viewcode --separate \
	-o apidoc \
	/home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh \
	/home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh/*/tests /home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh/*/tests/* /home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh/*/*/tests/* /home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh/*/*/*/tests/* /home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh/*/migrations /home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh/*/migrations/* /home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh/*/*/migrations/* /home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh/*/*/*/migrations/* /home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh/*/wsgi.py /home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh/*/*/wsgi.py /home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh/*/*/*/wsgi.py /home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh/*/*/*/wsgi.py /home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh/deposit/settings/* /home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh/web/settings/* /home/jenkins/workspace/swh-environment/swh-docs/.tox/sphinx-dev/lib/python3.7/site-packages/swh/dataset/*
Creating file apidoc/swh.rst.
Creating file apidoc/swh.docs.rst.
Creating file apidoc/swh.docs.django_settings.rst.
Creating file apidoc/swh.docs.sphinx.rst.
Creating file apidoc/swh.docs.sphinx.conf.rst.
Creating file apidoc/swh.docs.sphinx.view_in_phabricator.rst.
Creating file apidoc/modules.rst.
May 5 2021, 11:14 AM · Continuous Integration, Documentation
vsellier added a comment to T3243: Replace /dev/sdb and /dev/sdc on storage1.staging.

The disk were replaced by Christophe.
Apparently, the led of one of the disk is still on, so they need to be switched off:

root@storage1:~# ls /dev/sd* | grep -e "[a-z]$" | xargs -n1 -t -i{} ledctl normal={} 
ledctl normal=/dev/sda 
ledctl normal=/dev/sdb 
ledctl normal=/dev/sdc 
ledctl normal=/dev/sdd 
ledctl normal=/dev/sde 
ledctl normal=/dev/sdf 
ledctl normal=/dev/sdg 
ledctl normal=/dev/sdh 
ledctl normal=/dev/sdi 
ledctl normal=/dev/sdj 
ledctl normal=/dev/sdk 
ledctl normal=/dev/sdl 
ledctl normal=/dev/sdm 
ledctl normal=/dev/sdn
May 5 2021, 11:02 AM · System administration, Staging environment
vsellier changed the status of T3307: apidoc generation is not performed on swh modules during dev build from Open to Work in Progress.
May 5 2021, 10:25 AM · Continuous Integration, Documentation
vsellier closed D5674: sphinx: Fix the creation of the pre-commit file during the mr update.
May 5 2021, 9:37 AM · Continuous Integration, Documentation
vsellier committed rCDFJ6082a2d0a727: sphinx: Fix the creation of the pre-commit file during the mr update (authored by vsellier).
sphinx: Fix the creation of the pre-commit file during the mr update
May 5 2021, 9:37 AM
vsellier added a reviewer for D5674: sphinx: Fix the creation of the pre-commit file during the mr update: Reviewers.
May 5 2021, 8:49 AM · Continuous Integration, Documentation

May 4 2021

vsellier requested review of D5674: sphinx: Fix the creation of the pre-commit file during the mr update.
May 4 2021, 7:00 PM · Continuous Integration, Documentation
vsellier committed rDSNIPfdd1f2956958: add the diagram of the blog article on the counters (authored by vsellier).
add the diagram of the blog article on the counters
May 4 2021, 2:34 PM
vsellier updated the task description for T3306: Upgrade the firewalls to version 21.5.1.
May 4 2021, 1:06 PM · System administration
vsellier moved T3306: Upgrade the firewalls to version 21.5.1 from Backlog to in-progress on the System administration board.
May 4 2021, 1:06 PM · System administration
vsellier changed the status of T3306: Upgrade the firewalls to version 21.5.1 from Open to Work in Progress.
May 4 2021, 1:05 PM · System administration
vsellier changed the status of T3203: docs: Document the firewall installation and procedures, a subtask of T3194: Upgrade opnsense firewalls from 20.7.4 to 21.1.4, from Open to Work in Progress.
May 4 2021, 12:57 PM · System administration
vsellier changed the status of T3203: docs: Document the firewall installation and procedures from Open to Work in Progress.
May 4 2021, 12:57 PM · Documentation, System administration
vsellier moved T3300: Make the permissions of the swh services' configuration file uniform from in-progress to done on the System administration board.
May 4 2021, 12:56 PM · System administration
vsellier closed T3300: Make the permissions of the swh services' configuration file uniform as Resolved.

new permissions updated by puppet

May 4 2021, 12:56 PM · System administration

May 3 2021

vsellier committed rSPSITE45c68cc1bbad: fix indentation (authored by vsellier).
fix indentation
May 3 2021, 3:40 PM
vsellier closed D5662: Make the permissions of the swh configuration files consistent.
May 3 2021, 3:40 PM
vsellier committed rSPSITEd164e62259ab: Make the permissions of the swh configuration files consistent (authored by vsellier).
Make the permissions of the swh configuration files consistent
May 3 2021, 3:40 PM
vsellier updated the diff for D5662: Make the permissions of the swh configuration files consistent.

fix a typo on the commit message

May 3 2021, 3:37 PM
vsellier added a revision to T3300: Make the permissions of the swh services' configuration file uniform: D5662: Make the permissions of the swh configuration files consistent.
May 3 2021, 3:36 PM · System administration
vsellier updated the summary of D5662: Make the permissions of the swh configuration files consistent.
May 3 2021, 3:36 PM
vsellier requested review of D5662: Make the permissions of the swh configuration files consistent.
May 3 2021, 3:32 PM
vsellier moved T3300: Make the permissions of the swh services' configuration file uniform from Backlog to in-progress on the System administration board.
May 3 2021, 3:02 PM · System administration
vsellier changed the status of T3300: Make the permissions of the swh services' configuration file uniform from Open to Work in Progress.
May 3 2021, 2:54 PM · System administration
vsellier added a comment to T3243: Replace /dev/sdb and /dev/sdc on storage1.staging.

The replacement disks were delivered at rocquencourt :

May 3 2021, 8:29 AM · System administration, Staging environment
vsellier added a comment to T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata.

The command seems to be delivered, I will check with the DSI how we can proceed for the installation

May 3 2021, 8:23 AM · System administration, Archive search

Apr 23 2021

vsellier added a comment to T3222: Monitor daily indexes are present on the log cluster and logs are correctly ingested.

logstash now exposes an api server[1] which seems to return some interesting metrics on the plugin behaviors.
For example, there is a section for the elasticsearch output plugin:

  "outputs": [
    {
      "id": "62d11c4234b8981da77a97955da92ac9de92b9a6dcd4582f407face31fd5c664",
      "events": {
        "duration_in_millis": 160089636,
        "in": 72818126,
        "out": 72818046
      },
      "bulk_requests": {
        "responses": {
          "200": 3860888
        },
        "successes": 3860888
      },
      "documents": {
        "successes": 72818046
      },
      "name": "elasticsearch"
    }
  ]
},

I'll try to implement a small python script checking if there is other response code than 200 in a first time to identify the behavior
Perhaps it will be also interesting to check other properties like queue size :

"queue": {
  "type": "memory",
  "events_count": 0,
  "queue_size_in_bytes": 0,
  "max_queue_size_in_bytes": 0
},
Apr 23 2021, 5:16 PM · System administration
vsellier added a comment to T3222: Monitor daily indexes are present on the log cluster and logs are correctly ingested.

I checked the icinga_logstash plugin[1] to see if it can be helpful but it's more oriented to logastash instances used to ingest data from log files. There is no options to check the number of events received/sent for example.

Apr 23 2021, 4:53 PM · System administration
vsellier committed rSENV035022b779a8: Replace clearly-defined vm by the mirror-test one (authored by vsellier).
Replace clearly-defined vm by the mirror-test one
Apr 23 2021, 4:46 PM
vsellier requested review of D5588: Activate swh-counters on all the webapps.
Apr 23 2021, 4:26 PM
vsellier added a revision to T2912: Next generation archive counters: D5588: Activate swh-counters on all the webapps.
Apr 23 2021, 4:26 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier changed the status of T3222: Monitor daily indexes are present on the log cluster and logs are correctly ingested, a subtask of T3219: No logs are ingested on elasticsearch since 2021-03-26, from Open to Work in Progress.
Apr 23 2021, 4:10 PM · System administrators
vsellier changed the status of T3222: Monitor daily indexes are present on the log cluster and logs are correctly ingested from Open to Work in Progress.
Apr 23 2021, 4:10 PM · System administration
vsellier edited projects for T3222: Monitor daily indexes are present on the log cluster and logs are correctly ingested, added: System administration; removed System administrators.
Apr 23 2021, 4:09 PM · System administration
vsellier added a comment to T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata.

According to the tracking page, the command has left the factory the Apr 22, 2021, The ETA is May 28, 2021*.

Apr 23 2021, 4:00 PM · System administration, Archive search
vsellier claimed T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata.
Apr 23 2021, 3:57 PM · System administration, Archive search
vsellier claimed T3129: Reliable monitoring of services: for users and for admins .
Apr 23 2021, 3:13 PM · Roadmap 2022, Roadmap 2021, Monitoring, meta-task
vsellier closed D5542: Remove tenma's access.

closed by rSPSITEe749fd9a244c669b108def9f008009b2f5563811

Apr 23 2021, 2:59 PM
vsellier closed T3251: Count authors from revisions and releases, a subtask of T2912: Next generation archive counters, as Resolved.
Apr 23 2021, 1:03 PM · Roadmap 2021, System administration, Monitoring, Web app
vsellier closed T3251: Count authors from revisions and releases as Resolved.

and the authors are now displayed on staging and production (webapp1)

Apr 23 2021, 1:03 PM · Monitoring, Web app
vsellier added a comment to T3251: Count authors from revisions and releases.

The lag for the production can be followed here: https://grafana.softwareheritage.org/goto/Di2H3z9Gk
(staging has already recovered)

Apr 23 2021, 12:57 PM · Monitoring, Web app
vsellier added a comment to T3251: Count authors from revisions and releases.

the swh-counters is deployed in production too:

  • upgrade swh-counters package and restart swh-counters backend and journal
root@counters1:~# apt dist-upgrade
...
Setting up python3-swh.counters (0.7.0-1~swh1~bpo10+1) ...
root@counters1:~# systemctl stop swh-counters-journal-client.service 
root@counters1:~# systemctl restart gunicorn-swh-counters.service 
root@counters1:~# systemctl start swh-counters-journal-client.service 
root@counters1:~# redis-cli pfcount person
(integer) 7

The count of the person already starts

  • stopping the journal-client to be able to reset the releases and revisions offsets
root@counters1:~# systemctl stop swh-counters-journal-client.service
  • reset the offsets
vsellier@kafka1 ~ % /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --all-topics --to-current --dry-run  --export --group swh.counters.journal_client 2>&1 > ~/counters_journal_client_offsets.csv
# revision reset
vsellier@kafka1 ~ % 
 /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets  --group swh.counters.journal_client --to-earliest --execute --topic swh.journal.objects.revision
# release reset
vsellier@kafka1 ~ %  /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets  --group swh.counters.journal_client --to-earliest --execute --topic swh.journal.objects.release 
# checks
/opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server $SERVER --reset-offsets --all-topics --to-current --dry-run  --export --group swh.counters.journal_client 2>&1 > ~/counters_journal_client_offsets-backfill.csv 
vsellier@kafka1 ~ % diff ~/counters_journal_client_offsets.csv ~/counters_journal_client_offsets-backfill.csv | less 
1c1
< "swh.journal.objects.revision",25,8275180
---
> "swh.journal.objects.revision",25,0
8c8
< "swh.journal.objects.release",128,78484
---
> "swh.journal.objects.release",128,0
16c16
...
  • journal client restarted
root@counters1:~# systemctl start swh-counters-journal-client.service
  • the person counters is growing fastly
root@counters1:~# date;redis-cli pfcount person
Fri 23 Apr 2021 10:55:54 AM UTC
(integer) 72358
root@counters1:~# date;redis-cli pfcount person
Fri 23 Apr 2021 10:55:57 AM UTC
(integer) 80618
Apr 23 2021, 12:56 PM · Monitoring, Web app
vsellier closed D5586: Activate the person's counter on the home page with swh-counters.
Apr 23 2021, 12:30 PM