Page MenuHomeSoftware Heritage

ftigeot (François Tigeot)
User

Projects

User Details

User Since
Sep 6 2017, 1:06 PM (76 w, 2 d)

Recent Activity

Fri, Feb 15

ftigeot committed rSPSITE21e8c7f6ce74: facter: Ignore bpf and cgroup2 mount points (authored by ftigeot).
facter: Ignore bpf and cgroup2 mount points
Fri, Feb 15, 5:00 PM
ftigeot added a comment to T1467: Slow network transfers from beaubourg.

Network limitation removed via a hotfix (manual route deletion).
Some network downtime will be required in the future to ensure the new /etc network configuration works as expected.

Fri, Feb 15, 11:04 AM · System administration

Wed, Feb 13

ftigeot added a comment to T1467: Slow network transfers from beaubourg.

Actual content of the vmbr0 interface configuration in beaubourg:/etc/network/interfaces:

auto vmbr0
iface vmbr0 inet static
        bridge_ports vlan440
        address 192.168.100.32
        netmask 255.255.255.0
        up ip route add 192.168.101.0/24 via 192.168.100.1
        up ip route add 192.168.200.0/21 via 192.168.100.1
        up ip rule add from 192.168.100.32 table private
        up ip route add default via 192.168.100.1 dev vmbr0 table private
        up ip route flush cache
        down ip route del default via 192.168.100.1 dev vmbr0 table private
        down ip rule del from 192.168.100.32 table private
        down ip route del 192.168.200.0/21 via 192.168.100.1
        down ip route del 192.168.101.0/24 via 192.168.100.1
        down ip route flush cache
Wed, Feb 13, 4:49 PM · System administration
ftigeot added a comment to T1467: Slow network transfers from beaubourg.

Outgoing network traffic from beaubourg to the local private network 192.168.100.0/24 transits via louvre.
Louvre re-emits network packets and sends them to the destination host.

Wed, Feb 13, 1:28 PM · System administration

Tue, Feb 12

ftigeot added a comment to T1526: Install a new VPN endpoint at Rocquencourt.

Louvre had previously fallen more than once. Some of the events are documented in T1173.

Tue, Feb 12, 1:20 PM · System administration
ftigeot triaged T1526: Install a new VPN endpoint at Rocquencourt as Normal priority.
Tue, Feb 12, 1:12 PM · System administration

Thu, Feb 7

ftigeot added a subtask for T1520: Numerous dm device failures on louvre: T1486: I/O error on worker06.internal.
Thu, Feb 7, 4:55 PM · System administration
ftigeot added a parent task for T1486: I/O error on worker06.internal: T1520: Numerous dm device failures on louvre.
Thu, Feb 7, 4:55 PM · System administration
ftigeot removed a parent task for T1520: Numerous dm device failures on louvre: T1486: I/O error on worker06.internal.
Thu, Feb 7, 4:55 PM · System administration
ftigeot removed a subtask for T1486: I/O error on worker06.internal: T1520: Numerous dm device failures on louvre.
Thu, Feb 7, 4:55 PM · System administration
ftigeot added a comment to T1520: Numerous dm device failures on louvre.

After the reboot, existing dm volumes on top of /dev/md3 still reported I/O errors:

[ 5200.552667] Buffer I/O error on dev dm-33, logical block 6999130, async page read
[ 5506.537868] Buffer I/O error on dev dm-35, logical block 2251864, async page read
Thu, Feb 7, 4:04 PM · System administration
ftigeot added a comment to T1520: Numerous dm device failures on louvre.

The kind of error reported massively and suddenly when louvre stopped operating properly:

Buffer I/O error on device dm-41, logical block 10474329
Thu, Feb 7, 3:51 PM · System administration
ftigeot updated the task description for T1520: Numerous dm device failures on louvre.
Thu, Feb 7, 3:48 PM · System administration
ftigeot added a comment to T1520: Numerous dm device failures on louvre.

Related-to: T1486, T1518

Thu, Feb 7, 3:46 PM · System administration
ftigeot changed the status of T1520: Numerous dm device failures on louvre from Open to Work in Progress.
Thu, Feb 7, 3:45 PM · System administration
ftigeot added a comment to T1486: I/O error on worker06.internal.

A brand new virtual disk was created, skipping bad data blocks:

Thu, Feb 7, 3:35 PM · System administration

Wed, Feb 6

ftigeot closed T1518: I/O error on louvre:/dev/md3 as Resolved.

RAID volume was successfully rebuilt, closing even though the root cause of the initial error was not found.

Wed, Feb 6, 10:43 AM · System administration
ftigeot closed T1518: I/O error on louvre:/dev/md3, a subtask of T1486: I/O error on worker06.internal, as Resolved.
Wed, Feb 6, 10:43 AM · System administration
ftigeot added a comment to T1518: I/O error on louvre:/dev/md3.

dm-11 is a device present on top of dm-10, itself backed by /dev/sda:

Wed, Feb 6, 10:42 AM · System administration
ftigeot added a comment to T1518: I/O error on louvre:/dev/md3.

More complete list of I/O errors as reported by dmesg(1):

[Tue Feb  5 09:38:53 2019] sd 0:0:2:0: [sda] tag#9 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[Tue Feb  5 09:38:53 2019] sd 0:0:2:0: [sda] tag#9 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[Tue Feb  5 09:38:53 2019] print_req_error: 140 callbacks suppressed
[Tue Feb  5 09:38:53 2019] print_req_error: I/O error, dev sda, sector 864969704
[Tue Feb  5 09:38:53 2019] device-mapper: multipath: Failing path 8:0.
[Tue Feb  5 09:38:53 2019] print_req_error: I/O error, dev dm-10, sector 864969704
[Tue Feb  5 09:38:53 2019] print_req_error: I/O error, dev dm-10, sector 865388544
[Tue Feb  5 09:38:53 2019] print_req_error: I/O error, dev dm-10, sector 149394832
[Tue Feb  5 09:38:53 2019] print_req_error: I/O error, dev dm-10, sector 180742368
[Tue Feb  5 09:38:53 2019] print_req_error: I/O error, dev dm-10, sector 180742224
[Tue Feb  5 09:38:53 2019] print_req_error: I/O error, dev dm-10, sector 212422416
[Tue Feb  5 09:38:53 2019] print_req_error: I/O error, dev dm-10, sector 212422368
[Tue Feb  5 09:38:53 2019] print_req_error: I/O error, dev dm-10, sector 212422032
[Tue Feb  5 09:38:53 2019] print_req_error: I/O error, dev dm-10, sector 149394864
[Tue Feb  5 09:38:53 2019] Buffer I/O error on dev dm-10, logical block 937684544, async page read
[Tue Feb  5 09:38:53 2019] md/raid10:md3: dm-11: rescheduling sector 1387516384
[Tue Feb  5 09:38:53 2019] md/raid10:md3: dm-11: rescheduling sector 1387516416
[Tue Feb  5 09:38:53 2019] md/raid10:md3: dm-11: rescheduling sector 1387516472
[Tue Feb  5 09:38:53 2019] md/raid10:md3: dm-11: rescheduling sector 1387516488
[Tue Feb  5 09:38:53 2019] md/raid10:md3: dm-11: rescheduling sector 1387516512
[Tue Feb  5 09:38:53 2019] md/raid10:md3: dm-11: rescheduling sector 1387516544
[Tue Feb  5 09:38:53 2019] md/raid10:md3: dm-11: rescheduling sector 1387516584
[Tue Feb  5 09:38:53 2019] md/raid10:md3: dm-11: rescheduling sector 1387516600
[Tue Feb  5 09:38:53 2019] md/raid10:md3: dm-11: rescheduling sector 4518605200
[Tue Feb  5 09:38:54 2019] md/raid10:md3: dm-11: rescheduling sector 2565880800
...
[Tue Feb  5 09:39:51 2019] md: super_written gets error=10
[Tue Feb  5 09:39:51 2019] md/raid10:md3: Disk failure on dm-11, disabling device.
                           md/raid10:md3: Operation continuing on 1 devices.
[Tue Feb  5 09:39:51 2019] md/raid10:md3: dm-13: redirecting sector 1387516384 to another mirror
Wed, Feb 6, 10:36 AM · System administration

Tue, Feb 5

ftigeot renamed T1467: Slow network transfers from beaubourg from Network timeout issues in the Proxmox cluster to Slow network transfers from beaubourg.
Tue, Feb 5, 5:18 PM · System administration
ftigeot added a comment to T1467: Slow network transfers from beaubourg.

None of the previous timeout issues are visible anymore on the Proxmox web interface.
They were possibly related to bad network quality on the web browser side (INRIA guest wifi).

Tue, Feb 5, 5:18 PM · System administration
ftigeot changed the status of T1518: I/O error on louvre:/dev/md3 from Open to Work in Progress.

Forcing a rebuild by removing and re-adding the faulty device:

mdadm --manage /dev/md3 -r /dev/dm-11
mdadm --manage /dev/md3 -a /dev/dm-11
Tue, Feb 5, 4:18 PM · System administration
ftigeot changed the status of T1518: I/O error on louvre:/dev/md3, a subtask of T1486: I/O error on worker06.internal, from Open to Work in Progress.
Tue, Feb 5, 4:18 PM · System administration
ftigeot added a comment to T1518: I/O error on louvre:/dev/md3.

Like in T1486, we have a dm device reporting I/O errors but no visible errors on the underlying physical device.

Tue, Feb 5, 4:08 PM · System administration
ftigeot added a comment to T1518: I/O error on louvre:/dev/md3.

A full read check of /dev/sda did not return any error:

# dd if=/dev/sda of=/dev/null bs=1M
Tue, Feb 5, 3:53 PM · System administration
ftigeot added a comment to T1518: I/O error on louvre:/dev/md3.

As shown by lsblk, dm-11 and dm-13 are partitions on multipath devices.
These devices are themselves respectively handled by the physical /dev/sda and /dev/sdb SSDs:

Tue, Feb 5, 3:50 PM · System administration
ftigeot triaged T1518: I/O error on louvre:/dev/md3 as High priority.
Tue, Feb 5, 3:36 PM · System administration

Fri, Feb 1

ftigeot renamed T1501: Phantom device mapper volume usage in Proxmox: logical volume is used by another device from Phantom device mapper volume usage in Proxmox to Phantom device mapper volume usage in Proxmox: logical volume is used by another device.
Fri, Feb 1, 11:20 AM · System administration
ftigeot closed T1509: Phantom device mapper volume usage in Proxmox: local storage is not available on target node as Resolved.

Removing the previously used volume allowed VM migration to complete.

Fri, Feb 1, 11:06 AM · System administration
ftigeot closed T1509: Phantom device mapper volume usage in Proxmox: local storage is not available on target node, a subtask of T1501: Phantom device mapper volume usage in Proxmox: logical volume is used by another device, as Resolved.
Fri, Feb 1, 11:06 AM · System administration
ftigeot added a comment to T1509: Phantom device mapper volume usage in Proxmox: local storage is not available on target node.

Since the previously used drive is not used anymore, I decided to remove it:

# lvchange -a y ssd/vm-102-disk-0
# vremove /dev/ssd/vm-102-disk-0
Fri, Feb 1, 11:01 AM · System administration
ftigeot changed the status of T1509: Phantom device mapper volume usage in Proxmox: local storage is not available on target node, a subtask of T1501: Phantom device mapper volume usage in Proxmox: logical volume is used by another device, from Open to Work in Progress.
Fri, Feb 1, 10:55 AM · System administration
ftigeot changed the status of T1509: Phantom device mapper volume usage in Proxmox: local storage is not available on target node from Open to Work in Progress.

The previous drive is neither active nor opened:

Fri, Feb 1, 10:55 AM · System administration
ftigeot added a comment to T1509: Phantom device mapper volume usage in Proxmox: local storage is not available on target node.

The main VM disk is stored on a "vm-102-disk-1" volume (on Ceph)
There is an inactive lvm volume on "beaubourg-ssd" formerly associated with this VM, it was used as the virtual disk backend before the virtual disk device was migrated to Ceph.

Fri, Feb 1, 10:53 AM · System administration
ftigeot added a comment to T1509: Phantom device mapper volume usage in Proxmox: local storage is not available on target node.

No mention of "beaubourg-ssd" is visible in the Proxmox virtual machine management interface.
All virtual disk backends are stored on Ceph.

Fri, Feb 1, 10:46 AM · System administration
ftigeot triaged T1509: Phantom device mapper volume usage in Proxmox: local storage is not available on target node as High priority.
Fri, Feb 1, 10:40 AM · System administration

Thu, Jan 31

ftigeot added a comment to T1501: Phantom device mapper volume usage in Proxmox: logical volume is used by another device.

Trying to manually disable the logical volume in question fails with the same error message

lvchange -a n /dev/ssd/vm-107-disk-0
Logical volume ssd/vm-107-disk-0 is used by another device.
Thu, Jan 31, 5:31 PM · System administration
ftigeot updated the task description for T1501: Phantom device mapper volume usage in Proxmox: logical volume is used by another device.
Thu, Jan 31, 2:55 PM · System administration

Wed, Jan 30

ftigeot closed T1502: Too many postgresql logs on dbreplica0.euwest.azure.internal.softwareheritage.org as Resolved.

Only keep 24 hours of log, and keep rotating on the same file names:

Wed, Jan 30, 3:09 PM · System administration
ftigeot added a comment to T1502: Too many postgresql logs on dbreplica0.euwest.azure.internal.softwareheritage.org.

There is no need to log all production queries on this server.
Reducing log to queries taking more than one millisecond to execute:

Wed, Jan 30, 2:23 PM · System administration
ftigeot triaged T1503: Rename hypervisor3 to a museum name as Normal priority.
Wed, Jan 30, 11:58 AM · System administration
ftigeot added a comment to T1467: Slow network transfers from beaubourg.

It turns out hypervisor3 is not the culprit we thought it was.
Removing T1392 from parent task list.

Wed, Jan 30, 11:54 AM · System administration
ftigeot removed a parent task for T1467: Slow network transfers from beaubourg: T1392: Add a new hypervisor.
Wed, Jan 30, 11:53 AM · System administration
ftigeot removed a subtask for T1392: Add a new hypervisor: T1467: Slow network transfers from beaubourg.
Wed, Jan 30, 11:53 AM · System administration
ftigeot changed the status of T1502: Too many postgresql logs on dbreplica0.euwest.azure.internal.softwareheritage.org from Open to Work in Progress.
Wed, Jan 30, 10:35 AM · System administration

Tue, Jan 29

ftigeot changed the status of T1501: Phantom device mapper volume usage in Proxmox: logical volume is used by another device from Open to Work in Progress.
Tue, Jan 29, 2:08 PM · System administration

Fri, Jan 25

ftigeot added a comment to T1467: Slow network transfers from beaubourg.

After running some additional tcp iperf tests, it is obvious beaubourg is the outlier.
Measured bandwidth :

  • from any 10G machine to any 10G machine (except beaubourg): > 9 Gb/s
  • from beaubourg to ceph-osd1, ceph-osd2 and hypervisor3: 600-800 Mb/s
  • from beaubourg to ceph-mon1: 230 Kb/s
Fri, Jan 25, 3:56 PM · System administration

Jan 22 2019

ftigeot added a comment to T1467: Slow network transfers from beaubourg.

Since all these machines are relied to the same pair of switches and these switches are managed by INRIA DSI-SESI, I have asked for their assistance in this ticket:
https://support.inria.fr/Ticket/Display.html?id=127011

Jan 22 2019, 3:18 PM · System administration
ftigeot added a comment to T1486: I/O error on worker06.internal.

The /dev/md3 check completed successfully and did not report any error.

Jan 22 2019, 8:41 AM · System administration
ftigeot claimed T1486: I/O error on worker06.internal.
Jan 22 2019, 8:41 AM · System administration

Jan 21 2019

ftigeot added a comment to T1486: I/O error on worker06.internal.

worker06.internal.softwareheritage.org is a VM running on louvre, Its virtual disk is backed by /dev/dm-36 on the host.

Jan 21 2019, 2:42 PM · System administration
ftigeot changed the status of T1486: I/O error on worker06.internal from Open to Work in Progress.
Jan 21 2019, 2:38 PM · System administration

Jan 16 2019

ftigeot added a comment to T1467: Slow network transfers from beaubourg.

For the previous iperf TCP test and without tuning, we also have:

  • an average transfer speed of 9,388 Mb/s between hypervisor3 and one of the 10G Ceph nodes, ceph-osd1.
  • an average rransfer speed of 8,364 Mb/s between beaubourg and ceph-osd1.
Jan 16 2019, 11:43 AM · System administration

Jan 15 2019

ftigeot added a comment to T1467: Slow network transfers from beaubourg.

Both beaubourg and hypervisor3 network interfaces have a 10Gb/s link layer connection.
Aggregated traffic from multiple iperf streams nevertheless never reaches more than ~= 90% of a 1Gb/s transfer speed.

Jan 15 2019, 5:05 PM · System administration
ftigeot added a comment to T1467: Slow network transfers from beaubourg.

Another thing worth noting is the vmbr0 interface on which the primary IP address is located, has a mtu of only 1500 bytes.
The network interfaces it is built on have a 9000 bytes mtu.

Jan 15 2019, 4:28 PM · System administration
ftigeot added a comment to T1467: Slow network transfers from beaubourg.

iperf tests show

  • network speed never reaches 1Gbps, even between hosts which have 10Gb/s network interfaces and are connected to the same switches
  • 19% of UDP packets get lost at 1Gb/s (less than 0.5% at 100Mb/s)
Jan 15 2019, 2:59 PM · System administration

Jan 14 2019

ftigeot added a comment to T1467: Slow network transfers from beaubourg.

Corosync warnings also routinely appear in the logs:

Jan 14 11:56:13 hypervisor3 corosync[5622]: notice  [TOTEM ] Retransmit List: 282eb9
Jan 14 11:56:13 hypervisor3 corosync[5622]:  [TOTEM ] Retransmit List: 282eb9
Jan 14 11:56:13 hypervisor3 corosync[5622]:  [TOTEM ] Retransmit List: 282eba
Jan 14 2019, 1:26 PM · System administration
ftigeot changed the status of T1467: Slow network transfers from beaubourg, a subtask of T1392: Add a new hypervisor, from Open to Work in Progress.
Jan 14 2019, 1:23 PM · System administration
ftigeot changed the status of T1467: Slow network transfers from beaubourg from Open to Work in Progress.

The network interface hardware on hypervisor3 is relatively new:

i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.1.14-k
Jan 14 2019, 1:23 PM · System administration

Jan 11 2019

ftigeot triaged T1467: Slow network transfers from beaubourg as Normal priority.
Jan 11 2019, 4:24 PM · System administration

Dec 21 2018

ftigeot closed T1325: Add SSDs to banco as Resolved.

Two 4TB SSDs added to banco yesterday, exported to Linux as JBODs.

Dec 21 2018, 4:10 PM · System administration
ftigeot added a comment to T1392: Add a new hypervisor.

Proxmox now installed on the machine, hypervisor3.softwareheritage.org.

Dec 21 2018, 4:08 PM · System administration
ftigeot committed rSPSITE59ee68802a76: manifests/site: add a new hypervisor, hypervisor3 (authored by ftigeot).
manifests/site: add a new hypervisor, hypervisor3
Dec 21 2018, 2:14 PM

Dec 13 2018

ftigeot moved T1442: Replace Munin graphs with Grafana/Prometheus dashboards from Backlog to in progress on the Sprint 2018 12 board.
Dec 13 2018, 4:22 PM · Sprint 2018 12, System administration
ftigeot changed the status of T1442: Replace Munin graphs with Grafana/Prometheus dashboards, a subtask of T1356: Kill munin, from Open to Work in Progress.
Dec 13 2018, 4:21 PM · Sprint 2018 12, System administration
ftigeot changed the status of T1442: Replace Munin graphs with Grafana/Prometheus dashboards from Open to Work in Progress.
Dec 13 2018, 4:21 PM · Sprint 2018 12, System administration
ftigeot triaged T1442: Replace Munin graphs with Grafana/Prometheus dashboards as High priority.
Dec 13 2018, 4:19 PM · Sprint 2018 12, System administration
ftigeot added a parent task for T1428: Create an inventory of useful Munin metrics: T1356: Kill munin.
Dec 13 2018, 4:14 PM · Metrics/monitoring, Sprint 2018 12
ftigeot added a subtask for T1356: Kill munin: T1428: Create an inventory of useful Munin metrics.
Dec 13 2018, 4:14 PM · Sprint 2018 12, System administration

Dec 11 2018

ftigeot changed the status of T1338: Change BBUs on orsay from Open to Work in Progress.

Another Perc H700 battery replacement product: http://www.hardware-attitude.com/fiche-1114-batterie-raid-pour-perc5-i-perc6-i---nu209.html
We should buy this one if possible ASAP IMHO.

Dec 11 2018, 4:55 PM · System administration

Dec 7 2018

ftigeot added a comment to T1372: Compare Rsnapshot / BorgBackup / Backuppc.

Borgbackup is unable to pull data from remote hosts to a central location.

I do not understand this assertion.

Dec 7 2018, 10:50 AM · System administration

Dec 4 2018

ftigeot changed the status of T1428: Create an inventory of useful Munin metrics from Open to Work in Progress.

Disk

  • I/Os per device
  • Disk usage in percent
  • Utilization per device is this real ? it could be useful to see if a storage subsystem is overloaded
  • Disk usage in absolute human values. percentages are meaningless if we resize filesystems
Dec 4 2018, 4:11 PM · Metrics/monitoring, Sprint 2018 12
ftigeot changed the status of T1428: Create an inventory of useful Munin metrics, a subtask of T1408: More/better Metrics, from Open to Work in Progress.
Dec 4 2018, 4:11 PM · Metrics/monitoring, Sprint 2018 12
ftigeot updated subscribers of T1428: Create an inventory of useful Munin metrics.
Dec 4 2018, 2:46 PM · Metrics/monitoring, Sprint 2018 12
ftigeot triaged T1428: Create an inventory of useful Munin metrics as Normal priority.
Dec 4 2018, 2:45 PM · Metrics/monitoring, Sprint 2018 12
ftigeot changed the status of T1372: Compare Rsnapshot / BorgBackup / Backuppc, a subtask of T1282: Revisit backups, from Open to Work in Progress.
Dec 4 2018, 2:41 PM · System administration
ftigeot changed the status of T1372: Compare Rsnapshot / BorgBackup / Backuppc from Open to Work in Progress.

There is a huge difference between Borgbackup and Rsnapshot + Backuppc: Borgbackup is unable to pull data from remote hosts to a central location.
Its working model is based on Borgbackup running locally and storing data to a local filesystem.

Dec 4 2018, 2:41 PM · System administration
ftigeot added a comment to T1392: Add a new hypervisor.

New hypervisor hardware has been racked in our bay at Rocquencourt.
The machine's iDrac management interface is accessible on the management network, under the name swh7-adm.inria.fr (details on the wiki).

Dec 4 2018, 11:56 AM · System administration
ftigeot closed T1404: Resolve disk full issue on somerset:/srv/softwareheritage/postgres as Resolved.

Service postgresql@10-indexer.service has been restarted on somerset and database replication is once again operating normally.
Postgres wal files are being removed as expected on the master, slowly freeing disk space.

Dec 4 2018, 11:31 AM · System administration

Dec 3 2018

ftigeot added a comment to T1404: Resolve disk full issue on somerset:/srv/softwareheritage/postgres.

Some no longer useful dump files were removed by seirl@, freeing some space on somerset:/srv/softwareheritage/postgres .

Dec 3 2018, 3:19 PM · System administration
ftigeot added a comment to T1404: Resolve disk full issue on somerset:/srv/softwareheritage/postgres.

somerset:softwareheritage-indexer is the master database for dbreplica1:softwareheritage-indexer.

Dec 3 2018, 3:17 PM · System administration
ftigeot added a parent task for T1395: Enlarge disk on dbreplica1: T1404: Resolve disk full issue on somerset:/srv/softwareheritage/postgres.
Dec 3 2018, 3:11 PM · System administration
ftigeot added a subtask for T1404: Resolve disk full issue on somerset:/srv/softwareheritage/postgres: T1395: Enlarge disk on dbreplica1.
Dec 3 2018, 3:11 PM · System administration
ftigeot changed the status of T1404: Resolve disk full issue on somerset:/srv/softwareheritage/postgres from Open to Work in Progress.
Dec 3 2018, 3:10 PM · System administration
ftigeot closed T1395: Enlarge disk on dbreplica1 as Resolved.

The pvmove command was done this morning.

Dec 3 2018, 3:07 PM · System administration

Nov 27 2018

ftigeot added a parent task for T1372: Compare Rsnapshot / BorgBackup / Backuppc: T1282: Revisit backups.
Nov 27 2018, 4:45 PM · System administration
ftigeot added a subtask for T1282: Revisit backups: T1372: Compare Rsnapshot / BorgBackup / Backuppc.
Nov 27 2018, 4:45 PM · System administration
ftigeot changed the status of T1392: Add a new hypervisor from Open to Work in Progress.
Nov 27 2018, 4:42 PM · System administration

Nov 23 2018

ftigeot added a comment to T1338: Change BBUs on orsay.

At least some of the batteries for PERC H800 adapters use part number KR174 and/or M164C.
Some information leads me to believe they could also be used with PERC H700 adapters.

Nov 23 2018, 3:20 PM · System administration
ftigeot lowered the priority of T979: Migrate TLS certificates away from the *.softwareheritage.org wildcards from High to Wishlist.

I did some experiments with Letsencrypt but other things were more urgent during the September-October 2018 period and in the end a wildcard Digicert certificate was used again instead.

Nov 23 2018, 3:04 PM · System administration
ftigeot committed rSPSITE33fdc25ae44e: Rsnapshot master role: Exclude file patterns from backups (authored by ftigeot).
Rsnapshot master role: Exclude file patterns from backups
Nov 23 2018, 2:06 PM

Nov 22 2018

ftigeot committed rSPSITE57ad56cde817: data/default: Export root@banco's public ssh key (authored by ftigeot).
data/default: Export root@banco's public ssh key
Nov 22 2018, 3:03 PM

Nov 20 2018

ftigeot triaged T1372: Compare Rsnapshot / BorgBackup / Backuppc as Normal priority.
Nov 20 2018, 4:36 PM · System administration
ftigeot committed rSPSITEf5e70254d953: Rsnapshot master role: Do not run rsnapshot hourly every minute (authored by ftigeot).
Rsnapshot master role: Do not run rsnapshot hourly every minute
Nov 20 2018, 4:09 PM

Nov 16 2018

ftigeot added a comment to T1338: Change BBUs on orsay.

Batteries for PERC H700 adapters have the part number U8735 and/or NU209.

Nov 16 2018, 3:55 PM · System administration
ftigeot committed rSPSITEe5b5d5b49b94: Rsnapshot master role: last minute fixes (authored by ftigeot).
Rsnapshot master role: last minute fixes
Nov 16 2018, 2:54 PM
ftigeot committed rSPSITEe740b250680e: Add a new rsnapshot::master role (authored by ftigeot).
Add a new rsnapshot::master role
Nov 16 2018, 2:22 PM

Nov 15 2018

ftigeot added a comment to T1338: Change BBUs on orsay.

Orsay contains two LSI SAS 2108-based RAID adapters:

05:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)
22:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)
Nov 15 2018, 12:27 PM · System administration
ftigeot added a comment to T1325: Add SSDs to banco.

Since the SSDs we have are 2.5", we need a special adapter disk tray, which Dell refuses to sell us.

Nov 15 2018, 12:01 PM · System administration