Page MenuHomeSoftware Heritage

Staging environmentFolder
ActivePublic

Members

  • This project does not have any members.
  • View All

Watchers

  • This project does not have any watchers.
  • View All

Details

Description

Production-like environment for testing changes in SWH's infrastructure before applying them to production

Recent Activity

Yesterday

vsellier closed T3243: Replace /dev/sdb and /dev/sdc on storage1.staging as Resolved.

Actions performed:

  • wwn-0x5000c500d5de652a(sdb) : new -> spare
  • wwn-0x5000c500a22eed6f(sdh) : spare -> mirror
  • wwn-0x5000c500d5dda886(sdc) : new -> mirror
Thu, May 6, 9:54 AM · System administration, Staging environment
vsellier closed T3243: Replace /dev/sdb and /dev/sdc on storage1.staging, a subtask of T3236: staging: Disk error on storage1, as Resolved.
Thu, May 6, 9:54 AM · System administration, Staging environment
vsellier added a comment to T3243: Replace /dev/sdb and /dev/sdc on storage1.staging.

The checks ran without detecting any bad block on the disk.
They can be added on the zfs pool again.

Thu, May 6, 9:33 AM · System administration, Staging environment

Wed, May 5

vsellier added a comment to T3243: Replace /dev/sdb and /dev/sdc on storage1.staging.

A full badblock test is launched on both disks:

root@storage1:~# badblocks -v -w -B -s -b 4096 /dev/sdb
root@storage1:~# badblocks -v -w -B -s -b 4096 /dev/sdc
Wed, May 5, 11:33 AM · System administration, Staging environment
vsellier added a comment to T3243: Replace /dev/sdb and /dev/sdc on storage1.staging.

The disk were replaced by Christophe.
Apparently, the led of one of the disk is still on, so they need to be switched off:

root@storage1:~# ls /dev/sd* | grep -e "[a-z]$" | xargs -n1 -t -i{} ledctl normal={} 
ledctl normal=/dev/sda 
ledctl normal=/dev/sdb 
ledctl normal=/dev/sdc 
ledctl normal=/dev/sdd 
ledctl normal=/dev/sde 
ledctl normal=/dev/sdf 
ledctl normal=/dev/sdg 
ledctl normal=/dev/sdh 
ledctl normal=/dev/sdi 
ledctl normal=/dev/sdj 
ledctl normal=/dev/sdk 
ledctl normal=/dev/sdl 
ledctl normal=/dev/sdm 
ledctl normal=/dev/sdn
Wed, May 5, 11:02 AM · System administration, Staging environment

Mon, May 3

vsellier added a comment to T3243: Replace /dev/sdb and /dev/sdc on storage1.staging.

The replacement disks were delivered at rocquencourt :

Mon, May 3, 8:29 AM · System administration, Staging environment

Wed, Apr 21

ardumont moved T2770: Fix all icinga checks on staging webapp from deployed/landed to done on the System administration board.
Wed, Apr 21, 6:57 PM · Monitoring, System administration, Staging environment

Tue, Apr 20

vsellier added a comment to T3243: Replace /dev/sdb and /dev/sdc on storage1.staging.

The 2 disks were removed from the server and packaged to be sent to seagate.

Tue, Apr 20, 5:32 PM · System administration, Staging environment

Fri, Apr 16

vsellier moved T3243: Replace /dev/sdb and /dev/sdc on storage1.staging from Backlog to in-progress on the System administration board.
Fri, Apr 16, 10:12 AM · System administration, Staging environment

Thu, Apr 15

vsellier added a comment to T3243: Replace /dev/sdb and /dev/sdc on storage1.staging.

Email sent to the dsi to launch the replacement.

Thu, Apr 15, 3:03 PM · System administration, Staging environment
vsellier added a comment to T3243: Replace /dev/sdb and /dev/sdc on storage1.staging.

In preparation of the disk replacement, their leds must be activated to make the emplacement identifiable:

  • Ensure all the led are off
root@storage1:~# ls /dev/sd* | grep -e "[a-z]$" | xargs -n1 -t -i{} ledctl normal={} 
ledctl normal=/dev/sda 
ledctl normal=/dev/sdb 
ledctl normal=/dev/sdc 
ledctl normal=/dev/sdd 
ledctl normal=/dev/sde 
ledctl normal=/dev/sdf 
ledctl normal=/dev/sdg 
ledctl normal=/dev/sdh 
ledctl normal=/dev/sdi 
ledctl normal=/dev/sdj 
ledctl normal=/dev/sdk 
ledctl normal=/dev/sdl 
ledctl normal=/dev/sdm 
ledctl normal=/dev/sdn
  • light on
root@storage1:~# ledctl locate=/dev/sdb
root@storage1:~# ledctl locate=/dev/sdc
Thu, Apr 15, 2:31 PM · System administration, Staging environment

Mon, Apr 12

vsellier changed the status of T3243: Replace /dev/sdb and /dev/sdc on storage1.staging, a subtask of T3236: staging: Disk error on storage1, from Open to Work in Progress.
Mon, Apr 12, 7:31 PM · System administration, Staging environment
vsellier changed the status of T3243: Replace /dev/sdb and /dev/sdc on storage1.staging from Open to Work in Progress.

The disks are removed from the zfs pool. The replacement be done

Mon, Apr 12, 7:31 PM · System administration, Staging environment
vsellier closed T3236: staging: Disk error on storage1 as Resolved.
Mon, Apr 12, 7:30 PM · System administration, Staging environment
vsellier added a comment to T3236: staging: Disk error on storage1.

The mirror is removed fro the pool:

root@storage1:~# zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
data  21.8T  2.50T  19.3T        -         -    20%    11%  1.00x    ONLINE  -
Mon, Apr 12, 7:30 PM · System administration, Staging environment
vsellier added a comment to T3243: Replace /dev/sdb and /dev/sdc on storage1.staging.

Ticket opened on the seagate site for the replacement of these 2 disks, the information will be transferred to the DSI for the packaging

Mon, Apr 12, 4:14 PM · System administration, Staging environment
vsellier triaged T3243: Replace /dev/sdb and /dev/sdc on storage1.staging as High priority.
Mon, Apr 12, 2:26 PM · System administration, Staging environment
vsellier added a comment to T3236: staging: Disk error on storage1.

The mirror-1 removal is in progress:

root@storage1:~# zpool remove data mirror-1
Mon, Apr 12, 2:19 PM · System administration, Staging environment
vsellier added a comment to T3236: staging: Disk error on storage1.

There are 2 disks with errors that should now be replaced:

  • /dev/sdb/wwn-0x5000c500a23e3868 An old one
  • /dev/sdc'/wwn-0x5000c500a22f48c9` the disk just removed from the pool
Mon, Apr 12, 12:56 PM · System administration, Staging environment
vsellier added a comment to T3236: staging: Disk error on storage1.

The failing disk was removed from the pool:

root@storage1:~# zpool detach data wwn-0x5000c500a22f48c9
Mon, Apr 12, 12:49 PM · System administration, Staging environment
vsellier added a comment to T3236: staging: Disk error on storage1.

The new failing drive is /dev/sdc

root@storage1:~# ls -al /dev/disk/by-id/ | grep wwn-0x5000c500a22f48c9
lrwxrwxrwx 1 root root    9 Apr 11 03:42 wwn-0x5000c500a22f48c9 -> ../../sdc
lrwxrwxrwx 1 root root   10 Mar 11 17:08 wwn-0x5000c500a22f48c9-part1 -> ../../sdc1
lrwxrwxrwx 1 root root   10 Mar 11 17:08 wwn-0x5000c500a22f48c9-part9 -> ../../sdc9
Mon, Apr 12, 12:46 PM · System administration, Staging environment
vsellier changed the status of T3236: staging: Disk error on storage1 from Open to Work in Progress.
Mon, Apr 12, 12:09 PM · System administration, Staging environment
vsellier triaged T3236: staging: Disk error on storage1 as High priority.
Mon, Apr 12, 9:36 AM · System administration, Staging environment

Mar 22 2021

vsellier closed T3159: Deploy swh-counters:v0.1.0 in staging as Resolved.

A new vm counters0.internal.staging.swh.network is deployed and hosting redis, swh-counters and its journal-client.
The lag in staging will be recovered in a couple of hours.

Mar 22 2021, 5:34 PM · Staging environment, System administration, Monitoring
vsellier added a revision to T3159: Deploy swh-counters:v0.1.0 in staging: D5297: staging: Add counters0 vm.
Mar 22 2021, 3:40 PM · Staging environment, System administration, Monitoring
vsellier added a revision to T3159: Deploy swh-counters:v0.1.0 in staging: D5296: Add swh-counters deployment configuration.
Mar 22 2021, 8:32 AM · Staging environment, System administration, Monitoring

Mar 19 2021

vsellier moved T3159: Deploy swh-counters:v0.1.0 in staging from Backlog to in-progress on the System administration board.
Mar 19 2021, 12:39 PM · Staging environment, System administration, Monitoring
vsellier changed the status of T3159: Deploy swh-counters:v0.1.0 in staging from Open to Work in Progress.
Mar 19 2021, 12:39 PM · Staging environment, System administration, Monitoring

Feb 5 2021

vsellier added a comment to T2231: Continuous deployment.

I start to throw some ideas in this document : https://hedgedoc.softwareheritage.org/Fi2pq7zkSw6aVAJwk9Xhqw

Feb 5 2021, 5:48 PM · Staging environment, Roadmap 2020

Jan 29 2021

ardumont added a comment to T2920: Document staging infrastructure.

awesome, thanks.

Jan 29 2021, 12:24 PM · Documentation, System administration, Staging environment
vsellier moved T2920: Document staging infrastructure from in-progress to done on the System administration board.
Jan 29 2021, 12:21 PM · Documentation, System administration, Staging environment
vsellier closed T2920: Document staging infrastructure as Resolved.
  • Inventory updated to ensure all the components are associated to the staging environment
  • Staging page on the intranet updated [1]
  • Staging section on the network page [2] on the intranet updated
Jan 29 2021, 12:20 PM · Documentation, System administration, Staging environment

Jan 27 2021

vsellier added a comment to T2920: Document staging infrastructure.

This is a tryout to generate a global schema of the staging environment (P929):

Jan 27 2021, 6:09 PM · Documentation, System administration, Staging environment

Jan 25 2021

vsellier changed the status of T2920: Document staging infrastructure from Open to Work in Progress.
Jan 25 2021, 3:32 PM · Documentation, System administration, Staging environment

Jan 20 2021

moranegg added a project to T2920: Document staging infrastructure: Documentation.
Jan 20 2021, 10:33 AM · Documentation, System administration, Staging environment

Jan 18 2021

vsellier moved T2920: Document staging infrastructure from Backlog to Weekly backlog on the System administration board.
Jan 18 2021, 7:13 PM · Documentation, System administration, Staging environment
vsellier added a project to T2920: Document staging infrastructure: System administration.
Jan 18 2021, 7:13 PM · Documentation, System administration, Staging environment

Jan 6 2021

ardumont added a comment to T2770: Fix all icinga checks on staging webapp.

The last check no longer appears in icinga.

Jan 6 2021, 4:36 PM · Monitoring, System administration, Staging environment
ardumont closed T2770: Fix all icinga checks on staging webapp as Resolved.
Jan 6 2021, 4:36 PM · Monitoring, System administration, Staging environment
ardumont changed the status of T2770: Fix all icinga checks on staging webapp from Open to Work in Progress.
Jan 6 2021, 4:36 PM · Monitoring, System administration, Staging environment
ardumont moved T2877: Investigate spurious deposit logs from Backlog to deployed/landed on the System administration board.
Jan 6 2021, 3:45 PM · System administration, Staging environment, SWORD deposit

Jan 4 2021

vsellier closed T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage) as Resolved.

Closing this task as all the direct work is done.
The documentation will be addressed in T2920

Jan 4 2021, 12:33 PM · Staging environment, System administration
vsellier triaged T2920: Document staging infrastructure as Normal priority.
Jan 4 2021, 12:32 PM · Documentation, System administration, Staging environment

Dec 22 2020

vsellier added a comment to T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).

Everything looks good, let's try to add some documentation before closing the issue

Dec 22 2020, 9:56 AM · Staging environment, System administration
vsellier updated the task description for T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).
Dec 22 2020, 9:54 AM · Staging environment, System administration

Dec 21 2020

vsellier added a comment to T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).
  • A new vm objstorage0.internal.staging.swh.network is configured with an read-only object storage service
  • It's exposed to internet via the reverse proxy at https://objstorage.staging.swh.network (it quite different as the usual objstorage:5003 url but it allow to expose the service without new network configuration)
  • DNS entry added on gandi
  • Inventory updated
Dec 21 2020, 7:32 PM · Staging environment, System administration
vsellier added a revision to T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage): D4776: [staging] Configure and expose to internet a read-only objstorage.
Dec 21 2020, 6:01 PM · Staging environment, System administration
vsellier added a revision to T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage): D4775: Add objstorage0.staging.swh.network node to expose a r/o objstorage node.
Dec 21 2020, 4:48 PM · Staging environment, System administration
vsellier updated the task description for T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).
Dec 21 2020, 12:58 PM · Staging environment, System administration
vsellier added a comment to T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).

A user was correctly configured and a read test performed :

Dec 21 2020, 12:57 PM · Staging environment, System administration