Page MenuHomeSoftware Heritage
Feed Advanced Search

Jun 10 2021

vsellier committed rDSNIP306b44b4f019: Add swh-storage installation on ansible scripts (authored by vsellier).
Add swh-storage installation on ansible scripts
Jun 10 2021, 3:42 PM
vsellier committed rDSNIPda71dd2666e8: swh-storage (authored by vsellier).
swh-storage
Jun 10 2021, 3:42 PM
vsellier added a comment to T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata.
  • old nodes removed from proxmox. It has freed up some space on ceph:

Jun 10 2021, 11:27 AM · System administration, Archive search
vsellier committed rSPRE309b302651eb: Removing search-esnode[1-3] nodes replaced by bare metal servers (authored by vsellier).
Removing search-esnode[1-3] nodes replaced by bare metal servers
Jun 10 2021, 11:11 AM
vsellier added a comment to T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata.
  • configuration of the swh-search and journal clients services deployed
  • Old node decommissionning on the cluster:
export ES_NODE=192.168.100.86:9200
curl -H "Content-Type: application/json" -XPUT http://${ES_NODE}/_cluster/settings\?pretty -d '{ 
    "transient" : {
        "cluster.routing.allocation.exclude._ip" : "192.168.100.81,192.168.100.82,192.168.100.83"
    }
}'
{
  "acknowledged" : true,
  "persistent" : { },
  "transient" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "exclude" : {
            "_ip" : "192.168.100.81,192.168.100.82,192.168.100.83"
          }
        }
      }
    }
  }
}

The shards start to be gently moved from the old servers:

curl -s http://search-esnode4:9200/_cat/allocation\?s\=host\&v                                                                        10:22:58
shards disk.indices disk.used disk.avail disk.total disk.percent host           ip             node
    27       38.7gb    38.8gb    153.7gb    192.6gb           20 192.168.100.81 192.168.100.81 search-esnode1
    27       37.7gb    37.8gb    154.8gb    192.6gb           19 192.168.100.82 192.168.100.82 search-esnode2
    22       30.5gb    30.6gb      162gb    192.6gb           15 192.168.100.83 192.168.100.83 search-esnode3
    35         50gb    50.1gb      6.6tb      6.7tb            0 192.168.100.86 192.168.100.86 search-esnode4
    35         50gb    50.2gb      6.6tb      6.7tb            0 192.168.100.87 192.168.100.87 search-esnode5
    34       49.4gb    49.5gb      6.6tb      6.7tb            0 192.168.100.88 192.168.100.88 search-esnode6

When they will be no shards on the old servers, we will be able to stop them and remove them from the proxmox server.

Jun 10 2021, 10:24 AM · System administration, Archive search
vsellier committed rSPSITEef470755943c: vagrant: declare new search-esnode servers (authored by vsellier).
vagrant: declare new search-esnode servers
Jun 10 2021, 9:59 AM
vsellier committed rSPSITE18d3746d0f92: swh-search: change elasticsearch nodes (authored by vsellier).
swh-search: change elasticsearch nodes
Jun 10 2021, 9:59 AM

Jun 9 2021

vsellier added a comment to T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata.

And all the new nodes are now in the production cluster:

curl -s http://search\-esnode4:9200/_cat/allocation\?s\=host\&v                                                                35m 9s 18:47:23
shards disk.indices disk.used disk.avail disk.total disk.percent host           ip             node
    30       42.7gb    42.9gb    149.7gb    192.6gb           22 192.168.100.81 192.168.100.81 search-esnode1
    30       41.4gb    41.6gb      151gb    192.6gb           21 192.168.100.82 192.168.100.82 search-esnode2
    30       41.7gb    41.8gb    150.8gb    192.6gb           21 192.168.100.83 192.168.100.83 search-esnode3
    30       41.9gb      42gb      6.6tb      6.7tb            0 192.168.100.86 192.168.100.86 search-esnode4
    30       41.8gb    41.9gb      6.6tb      6.7tb            0 192.168.100.87 192.168.100.87 search-esnode5
    30       41.2gb    41.3gb      6.6tb      6.7tb            0 192.168.100.88 192.168.100.88 search-esnode6

The next step will be to switch the swh-search configurations to use the new nodes and progressively remove the old nodes from the cluster.

Jun 9 2021, 6:49 PM · System administration, Archive search
vsellier added a comment to T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata.
  • zfs installation:
root@search-esnode4:~# apt update && apt install linux-image-amd64 linux-headers-amd64
root@search-esnode4:~# shutdown -r now  # to apply the kernel
root@search-esnode4:~# apt install libnvpair1linux libuutil1linux libzfs2linux libzpool2linux zfs-dkms zfsutils-linux zfs-zed
  • refresh with the last packages installed from backports
root@search-esnode4:~# apt dist-upgrade # trigger a udev upgrade which leads to a network interface renaming
root@search-esnode4:~# sed -i 's/ens1/enp2s0/g' /etc/network/interfaces
  • pre zfs configuration actions:
root@search-esnode4:~# puppet agent --disable
root@search-esnode4:~# systemctl disable elasticsearch
root@search-esnode4:~# systemctl stop elasticsearch
root@search-esnode4:~# rm -rf /srv/elasticsearch/nodes
Jun 9 2021, 6:10 PM · System administration, Archive search
vsellier committed rSPSITEb3a73bccc1ee: swh-search: Declare new bare metal nodes (authored by vsellier).
swh-search: Declare new bare metal nodes
Jun 9 2021, 12:17 PM
vsellier added a comment to T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata.
  • To manage the disks via zfs, the raid card needed to be configured in enhanced HBA mode in the idrac
  • after a rebbot, the disks are well detected by the system:
root@search-esnode4:~# ls -al /dev/sd*
brw-rw---- 1 root disk 8,  0 Jun  9 04:54 /dev/sda
brw-rw---- 1 root disk 8, 16 Jun  9 04:54 /dev/sdb
brw-rw---- 1 root disk 8, 32 Jun  9 04:54 /dev/sdc
brw-rw---- 1 root disk 8, 48 Jun  9 04:54 /dev/sdd
brw-rw---- 1 root disk 8, 64 Jun  9 04:54 /dev/sde
brw-rw---- 1 root disk 8, 80 Jun  9 04:54 /dev/sdf
brw-rw---- 1 root disk 8, 96 Jun  9 04:54 /dev/sdg
brw-rw---- 1 root disk 8, 97 Jun  9 04:54 /dev/sdg1
brw-rw---- 1 root disk 8, 98 Jun  9 04:54 /dev/sdg2
brw-rw---- 1 root disk 8, 99 Jun  9 04:54 /dev/sdg3
root@search-esnode4:~# smartctl -a /dev/sda
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-16-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
Jun 9 2021, 11:58 AM · System administration, Archive search

Jun 8 2021

vsellier committed rDSNIPe2b98bc6aba1: grid5000/cassandra: management scripts (authored by vsellier).
grid5000/cassandra: management scripts
Jun 8 2021, 9:13 PM
vsellier committed rDSNIP75fb371aeda0: grid5000/cassandra: install and configure zfs on cassandra nodes (authored by vsellier).
grid5000/cassandra: install and configure zfs on cassandra nodes
Jun 8 2021, 9:13 PM
vsellier committed rDSNIPfb7444db1d62: grid5000/cassandra: Not functional terrarform/ansible poc (authored by vsellier).
grid5000/cassandra: Not functional terrarform/ansible poc
Jun 8 2021, 9:13 PM

Jun 4 2021

vsellier committed rDSNIP183077d59de6: grid5000/cassandra: configure a cassandra cluster with ansible (authored by vsellier).
grid5000/cassandra: configure a cassandra cluster with ansible
Jun 4 2021, 6:17 PM

Jun 3 2021

vsellier updated subscribers of T3357: Perform some tests of the cassandra storage on Grid5000.

I played with grid5000 to experiment how the jobs work and how to initialize the reserved nodes.

Jun 3 2021, 7:30 PM · System administration, Storage manager
vsellier committed rDSNIPe39b40412e79: grid5000: test of terraform provisionning (authored by vsellier).
grid5000: test of terraform provisionning
Jun 3 2021, 7:25 PM
vsellier committed rDSNIPd0d0e73961ef: add VPN migration phases (authored by vsellier).
add VPN migration phases
Jun 3 2021, 3:54 PM
vsellier accepted D5814: Dedicate a loader_oneshot service for temporary use.

lgtm

Jun 3 2021, 10:20 AM

Jun 2 2021

vsellier changed the status of T3357: Perform some tests of the cassandra storage on Grid5000 from Open to Work in Progress.
Jun 2 2021, 6:25 PM · System administration, Storage manager
vsellier added a comment to T1526: Install a new VPN endpoint at Rocquencourt.

Actually, we have the old openvpn and ipsec running in parallel of the new opnsenses VPNs:

Jun 2 2021, 5:24 PM · System administration
vsellier closed T3355: Running save code now request are never detected as completed by the webapp as Resolved.
  • The fix was deployed on webapp1 and moma
  • The refresh script was manually launched:
root@webapp1:~# /usr/local/bin/refresh-savecodenow-statuses
Successfully updated 140 save request(s).

The previous requests were correctly refreshed and are now displaying the right status.

Jun 2 2021, 3:14 PM · Save Code Now, Web app
vsellier added a comment to T3355: Running save code now request are never detected as completed by the webapp .

Will be deployed with version v0.0.310 of the webapp (build in progress)

Jun 2 2021, 2:22 PM · Save Code Now, Web app
vsellier closed D5810: Update running save origin request status.
Jun 2 2021, 2:17 PM
vsellier committed rDWAPPS267e8365f0d6: Update running save origin request status (authored by vsellier).
Update running save origin request status
Jun 2 2021, 2:17 PM
vsellier updated the diff for D5810: Update running save origin request status.

fix typo in commit message

Jun 2 2021, 12:21 PM
vsellier updated the summary of D5810: Update running save origin request status.
Jun 2 2021, 12:20 PM
vsellier requested review of D5810: Update running save origin request status.
Jun 2 2021, 12:18 PM
vsellier added a revision to T3355: Running save code now request are never detected as completed by the webapp : D5810: Update running save origin request status.
Jun 2 2021, 12:07 PM · Save Code Now, Web app
vsellier renamed T3355: Running save code now request are never detected as completed by the webapp from Running save code now request are never finalized to Running save code now request are never detected as completed by the webapp .
Jun 2 2021, 11:58 AM · Save Code Now, Web app
vsellier changed the status of T3355: Running save code now request are never detected as completed by the webapp from Open to Work in Progress.
Jun 2 2021, 11:57 AM · Save Code Now, Web app

Jun 1 2021

vsellier committed rSPSITEc1f48c4fa734: Increase the limit to write pack files on disk (authored by vsellier).
Increase the limit to write pack files on disk
Jun 1 2021, 4:27 PM

May 28 2021

vsellier closed D5800: network: Declare the new opnsense vpn network range.
May 28 2021, 5:12 PM
vsellier committed rSPSITE991600f4f8df: network: Declare the new opnsense vpn network range (authored by vsellier).
network: Declare the new opnsense vpn network range
May 28 2021, 5:12 PM
vsellier requested review of D5800: network: Declare the new opnsense vpn network range.
May 28 2021, 3:23 PM
vsellier added a revision to T1526: Install a new VPN endpoint at Rocquencourt: D5800: network: Declare the new opnsense vpn network range.
May 28 2021, 3:23 PM · System administration
vsellier added a comment to T1526: Install a new VPN endpoint at Rocquencourt.

The OPNsense firewall configuration was finalized based on the initial configuration olasd has previously done on the OPNsense firewalls.

May 28 2021, 12:00 PM · System administration

May 27 2021

vsellier changed the status of T1526: Install a new VPN endpoint at Rocquencourt from Open to Work in Progress.
May 27 2021, 11:00 AM · System administration
vsellier added a comment to T3129: Reliable monitoring of services: for users and for admins .

The save code now queue statistics are now displayed on the status.io page[1] as an example. The data are refreshed each 5 minutes.

May 27 2021, 10:59 AM · Roadmap 2022, Roadmap 2021, Monitoring, meta-task
vsellier added a comment to T3320: Test rancher pros/cons.
May 27 2021, 10:58 AM · System administration
vsellier committed rSPSITE705f4d26a234: status.io: fix api credentials (authored by vsellier).
status.io: fix api credentials
May 27 2021, 10:46 AM
vsellier closed D5787: status.io: push save code now statistics.
May 27 2021, 9:10 AM · System administration
vsellier committed rSPSITE9c01d2124948: status.io: push save code now statistics (authored by vsellier).
status.io: push save code now statistics
May 27 2021, 9:10 AM

May 26 2021

vsellier updated the diff for D5787: status.io: push save code now statistics.

update python script:

  • remove some prints
  • add missing types
  • use dict access instead of get
May 26 2021, 5:24 PM · System administration
vsellier added a project to D5787: status.io: push save code now statistics: System administration.
May 26 2021, 5:09 PM · System administration
vsellier updated subscribers of D5787: status.io: push save code now statistics.
May 26 2021, 5:09 PM · System administration
vsellier updated subscribers of D5787: status.io: push save code now statistics.
May 26 2021, 5:09 PM · System administration
vsellier requested review of D5787: status.io: push save code now statistics.
May 26 2021, 5:07 PM · System administration
vsellier added a revision to T3129: Reliable monitoring of services: for users and for admins : D5787: status.io: push save code now statistics.
May 26 2021, 5:07 PM · Roadmap 2022, Roadmap 2021, Monitoring, meta-task
vsellier committed rSPPRIVCc5692f8dbafc: Add censored status.io api credentials (authored by vsellier).
Add censored status.io api credentials
May 26 2021, 5:00 PM
vsellier created P1053 (An Untitled Masterwork).
May 26 2021, 3:16 PM
vsellier committed rPSTATIO584726209f06: Force stable rebuild (authored by vsellier).
Force stable rebuild
May 26 2021, 2:22 PM
vsellier committed rCJSWHbe7d718f636b: jobs/dependency-packages: change python3-statusio display name (authored by vsellier).
jobs/dependency-packages: change python3-statusio display name
May 26 2021, 1:50 PM
vsellier committed rCJSWHd241c051ac0d: jobs/dependency-packages: Add statusio-python package (authored by vsellier).
jobs/dependency-packages: Add statusio-python package
May 26 2021, 1:48 PM
vsellier committed rDSNIP8a5d814541aa: status.io: configure the script via parameters (authored by vsellier).
status.io: configure the script via parameters
May 26 2021, 10:03 AM

May 25 2021

vsellier added a comment to T3041: [production] Provision enough space for the search ES cluster to ingest all intrinsic metadata.

The servers should be installed on the rack the 26th May. The network configuration will follow the same day or next day.
They will be installed as it by the "DSI" so we will have to install the system via the iDRAC when they will be reachable.

May 25 2021, 3:07 PM · System administration, Archive search
vsellier added a comment to T3320: Test rancher pros/cons.

With a master declared in the dns, everything seems to work well.
when the docker command is launched on a node, it's status is well detected and the node is correctly configured after a couple of minute.
The cluster explorer is also working now.

May 25 2021, 2:59 PM · System administration
vsellier committed rSPSITEed1df1bc2d17: poc-rancher: add internal in the domain name (authored by vsellier).
poc-rancher: add internal in the domain name
May 25 2021, 12:05 PM
vsellier closed D5775: declare a temporary dns entry for the rancher master.
May 25 2021, 12:01 PM
vsellier committed rSPSITE3b49be29b0ce: declare a temporary dns entry for the rancher master (authored by vsellier).
declare a temporary dns entry for the rancher master
May 25 2021, 12:01 PM
vsellier requested review of D5775: declare a temporary dns entry for the rancher master.
May 25 2021, 12:00 PM
vsellier added a revision to T3320: Test rancher pros/cons: D5775: declare a temporary dns entry for the rancher master.
May 25 2021, 12:00 PM · System administration
vsellier added a comment to T3129: Reliable monitoring of services: for users and for admins .

Metrics can easily be pushed to the status page.
The simple poc for the save code now request is available here : https://forge.softwareheritage.org/source/snippets/browse/master/sysadmin/status.io/update_metrics.py

May 25 2021, 9:17 AM · Roadmap 2022, Roadmap 2021, Monitoring, meta-task
vsellier committed rDSNIP8af56d780f4c: status.io remove useless comments (authored by vsellier).
status.io remove useless comments
May 25 2021, 9:15 AM
vsellier committed rDSNIPa212bc6fa643: POC status.io's metrics (authored by vsellier).
POC status.io's metrics
May 25 2021, 9:13 AM

May 20 2021

vsellier added a comment to T3320: Test rancher pros/cons.

The basic installation with helm is simple for a mono server installation: https://rancher.com/docs/rancher/v2.5/en/installation/install-rancher-on-k8s/#install-the-rancher-helm-chart

May 20 2021, 6:37 PM · System administration
vsellier added a comment to T3129: Reliable monitoring of services: for users and for admins .

for the status.swh.org point of view, status.io is providing some api endpoint to push metrics. It should be possible to add some metrics (up to 10 with our plan) to expose the behavior of the platform (daily/weekly and monthly statistics).
As a first step, we could expose the number of pending save code now requests and the number of origin visits to have some live data. An example of a status page with metrics : https://status.docker.com/
I'm working on a code snippet to test the integration feasibility/complexity.

May 20 2021, 6:07 PM · Roadmap 2022, Roadmap 2021, Monitoring, meta-task
vsellier accepted D5759: Vagrantfile: Factorize duplication.

great simplification ! thanks

May 20 2021, 3:20 PM
vsellier changed the status of T3129: Reliable monitoring of services: for users and for admins from Open to Work in Progress.
May 20 2021, 12:01 PM · Roadmap 2022, Roadmap 2021, Monitoring, meta-task
vsellier accepted D5758: README: Update documentation to mention the standard puppet use works.

LGTM

May 20 2021, 11:50 AM
vsellier committed rSENVfa731b948cd5: vagrant: Fix wrong staging/production facts (authored by vsellier).
vagrant: Fix wrong staging/production facts
May 20 2021, 11:43 AM
vsellier accepted D5757: subnets/vagrant: Adapt pergamon manifests.

LGTM \o/

May 20 2021, 11:25 AM

May 19 2021

vsellier committed rSENVbb6f55e5ba6d: vagrant: Ensure facts are configured on the vms (authored by vsellier).
vagrant: Ensure facts are configured on the vms
May 19 2021, 8:13 PM
vsellier added a comment to T3325: Vagrantify puppet master.

After some hard time with vagrant internal and pergamon configuration, we finally have a puppet master working.
The collected resources are well detected and applied, for example here, with the logstash0's incinga resources :

Notice: /Stage[main]/Profile::Icinga2::Master/Icinga2::Object::Host[pergamon.softwareheritage.org]/Icinga2::Object[icinga2::object::Host::pergamon.softwareheritage.org]/Concat[/etc/icinga2/zones.d/master/pergamon.softwareheritage.org.conf]/File[/etc/icinga2/zones.d/master/pergamon.softwareheritage.org.conf]/ensure: defined content as '{md5}e98c7cafc5300df8101f591d1c7a708b'
Info: Concat[/etc/icinga2/zones.d/master/pergamon.softwareheritage.org.conf]: Scheduling refresh of Class[Icinga2::Service]
Notice: /Stage[main]/Profile::Grafana::Vhost/Icinga2::Object::Service[grafana http redirect on pergamon.softwareheritage.org]/Icinga2::Object[icinga2::object::Service::grafana http redirect on pergamon.softwareheritage.org]/Concat[/etc/icinga2/zones.d/master/exported-checks.conf]/File[/etc/icinga2/zones.d/master/exported-checks.conf]/content:
May 19 2021, 8:06 PM · System administration
vsellier committed rSENVa2fc2ee2d6fe: Remove etckeeper lag when a new package is installed (authored by vsellier).
Remove etckeeper lag when a new package is installed
May 19 2021, 6:37 PM
vsellier committed rSENVa0ebcc01a3e8: Use default puppet configuration to work when executed though passenger (authored by vsellier).
Use default puppet configuration to work when executed though passenger
May 19 2021, 4:47 PM
vsellier committed rSENVb98b1303d307: vagrant: Fix production/staging environment mismatch (authored by vsellier).
vagrant: Fix production/staging environment mismatch
May 19 2021, 12:31 PM
vsellier closed T3332: Create a dedicated icinga load profile for proxmox hypervisors as Resolved.
May 19 2021, 10:01 AM · System administration
vsellier closed D5753: Create a dedicated hypervisor load profile.
May 19 2021, 9:51 AM
vsellier committed rSPSITEa2811bb5a690: Create a dedicated hypervisor load profile (authored by vsellier).
Create a dedicated hypervisor load profile
May 19 2021, 9:51 AM
vsellier requested review of D5753: Create a dedicated hypervisor load profile.
May 19 2021, 9:41 AM
vsellier added a revision to T3332: Create a dedicated icinga load profile for proxmox hypervisors: D5753: Create a dedicated hypervisor load profile.
May 19 2021, 9:41 AM · System administration
vsellier changed the status of T3332: Create a dedicated icinga load profile for proxmox hypervisors from Open to Work in Progress.
May 19 2021, 9:26 AM · System administration

May 18 2021

vsellier added a comment to T3326: docker tests on Jenkins: error while removing network.

and the build is green ;)

May 18 2021, 7:07 PM · Continuous Integration, Docker environment
vsellier added a comment to T3326: docker tests on Jenkins: error while removing network.

thanks for having investigated that

May 18 2021, 6:49 PM · Continuous Integration, Docker environment
vsellier committed rSENVed2aa9bbace6: Upgrade vagrant configuration to configure puppet master (authored by vsellier).
Upgrade vagrant configuration to configure puppet master
May 18 2021, 5:09 PM
vsellier committed rSENVb0eec52e1706: Refresh the vagrant template to debian 10.9 (authored by vsellier).
Refresh the vagrant template to debian 10.9
May 18 2021, 5:09 PM

May 17 2021

vsellier accepted D5741: Vagrantfile: Adapt pergamon configuration.
May 17 2021, 2:41 PM

May 12 2021

vsellier committed rSENV073f997deb65: Install the puppet code where the master is expecting it (authored by vsellier).
Install the puppet code where the master is expecting it
May 12 2021, 12:30 PM
vsellier changed the status of T3325: Vagrantify puppet master from Open to Work in Progress.
May 12 2021, 12:24 PM · System administration
vsellier created P1042 count kafka lag.
May 12 2021, 9:50 AM

May 11 2021

vsellier committed rSENV62e61ce88401: Add missing certificate for sites hosted by pergamon (authored by vsellier).
Add missing certificate for sites hosted by pergamon
May 11 2021, 5:54 PM
vsellier moved T3320: Test rancher pros/cons from Backlog to in-progress on the System administration board.
May 11 2021, 9:31 AM · System administration
vsellier changed the status of T3320: Test rancher pros/cons from Open to Work in Progress.
May 11 2021, 9:30 AM · System administration

May 10 2021

vsellier closed D5727: infrastructure: fix sphinx warnings.
May 10 2021, 5:43 PM
vsellier committed rDDOC6335d12261ac: infrastructure: fix sphinx warnings (authored by vsellier).
infrastructure: fix sphinx warnings
May 10 2021, 5:43 PM
vsellier requested review of D5727: infrastructure: fix sphinx warnings.
May 10 2021, 5:41 PM
vsellier added a revision to T3203: docs: Document the firewall installation and procedures: D5727: infrastructure: fix sphinx warnings.
May 10 2021, 5:41 PM · Documentation, System administration
vsellier closed T3223: Elasticsearch: Monitor the max opened shards on a cluster as Resolved.

Theses errors will be caught by the alert created in T3222

May 10 2021, 3:03 PM · System administrators
vsellier closed T3223: Elasticsearch: Monitor the max opened shards on a cluster, a subtask of T3219: No logs are ingested on elasticsearch since 2021-03-26, as Resolved.
May 10 2021, 3:03 PM · System administrators