After the 2 disks failure on the raid0 of esnode1, the whole raid need to be rebuilt.
As kafka has been removed from the ES nodes, we now have an additional ~2To partition available on each server that can be allocated to the ES storage.
Some notes we have found during the T2888 incident:
- the os is installed on a single disk without replication. In case of a failure on the first disk, the system will be lost.
- it seems adding the former kafka partition as a datadir is not an ideal solution (T2888#55004)
- the partitioning of the disks that can be used for the raid is not homogeneous (one partition form the disk with the system + 3 complete disks), a raid0 with this 4 volumes will not be optimal
- the ipmi console is not available for these server making a complete re-installation complicated
It has finally been chosen to apply the same partitioning mapping on all the disks with a part allocated to the system and the remaining partition allocated managed in a zfs pool
It will allow to have the same partition size for all the raid volume and to prepare a way to have to system available on a software raid1 replicated on the 4 disks (it will be configured later)
The test will be performed on esnode1 in a first time, to be able to monitor the performance impacts.