Since hypervisor3 was added to the existing cluster, network timeout errors manifest themselves in the Proxmox web interface.
Error messages include:
- "communication failure(0)"
- "Connection error"
- "Loading"
Since hypervisor3 was added to the existing cluster, network timeout errors manifest themselves in the Proxmox web interface.
Error messages include:
The network interface hardware on hypervisor3 is relatively new:
i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.1.14-k
There are various open bug reports related to the i40e NIC and/or its driver such as this one: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779756 , which could means the i40e NIC and/org its driver is subtilely broken.
hypervisor3 is using Linux 4.15.18-9-pve .
The Linux git changelog in drivers/net/ethernet/intel/i40e between Linux 4.15 and 5.0-rc1 contains more than 120 occurrences of the word "fix". Almost none of these commits have been cherry-picked to v4.15.18.
It is hard to point to a particular bug we could be hitting though and we would need to run a new i40e driver version to test this hypothesis.
Corosync warnings also routinely appear in the logs:
Jan 14 11:56:13 hypervisor3 corosync[5622]: notice [TOTEM ] Retransmit List: 282eb9 Jan 14 11:56:13 hypervisor3 corosync[5622]: [TOTEM ] Retransmit List: 282eb9 Jan 14 11:56:13 hypervisor3 corosync[5622]: [TOTEM ] Retransmit List: 282eba
iperf tests show
[raw results, TCP test]
hypervisor3:/root#iperf -s -p 8042 ------------------------------------------------------------ Server listening on TCP port 8042 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.100.34 port 8042 connected with 192.168.100.1 port 36406 [ ID] Interval Transfer Bandwidth [ 4] 0.0-15.0 sec 1.58 GBytes 901 Mbits/sec [ 5] local 192.168.100.34 port 8042 connected with 192.168.100.32 port 34552 [ 5] 0.0-15.0 sec 1.55 GBytes 885 Mbits/sec [ 4] local 192.168.100.34 port 8042 connected with 192.168.100.9 port 36298 [ 4] 0.0-15.0 sec 1.64 GBytes 939 Mbits/sec [ 5] local 192.168.100.34 port 8042 connected with 192.168.100.32 port 34792 [ 5] 0.0-15.0 sec 1.58 GBytes 900 Mbits/sec
[raw results, UDP test]
hypervisor3:/root#iperf -u -s -p 8042 ------------------------------------------------------------ Server listening on UDP port 8042 Receiving 1470 byte datagrams UDP buffer size: 208 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.100.34 port 8042 connected with 192.168.100.32 port 43152 [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 3] 0.0-10.0 sec 11.9 MBytes 10.0 Mbits/sec 0.123 ms 0/ 8505 (0%) [ 4] local 192.168.100.34 port 8042 connected with 192.168.100.32 port 42732 [ 4] 0.0-10.0 sec 119 MBytes 100 Mbits/sec 0.095 ms 35/85035 (0.041%) [ 3] local 192.168.100.34 port 8042 connected with 192.168.100.32 port 44095 [ 3] 0.0-10.0 sec 964 MBytes 808 Mbits/sec 0.018 ms 162574/850341 (19%)
Another thing worth noting is the vmbr0 interface on which the primary IP address is located, has a mtu of only 1500 bytes.
The network interfaces it is built on have a 9000 bytes mtu.
Both beaubourg and hypervisor3 network interfaces have a 10Gb/s link layer connection.
Aggregated traffic from multiple iperf streams nevertheless never reaches more than ~= 90% of a 1Gb/s transfer speed.
For the previous iperf TCP test and without tuning, we also have:
Since all these machines are relied to the same pair of switches and these switches are managed by INRIA DSI-SESI, I have asked for their assistance in this ticket:
https://support.inria.fr/Ticket/Display.html?id=127011
Note that iperf tests are half-duplex, so you need to execute them both ways to have the full picture. Ideally you should run 3 test series:
machine1 | traffic | machine2 |
iperf -s | <-- | iperf -c |
iperf -c | --> | iperf -s |
iperf -s | <==> | iperf -d -c |
Note also that for a given test execution, the results are not the same on both the machines, giving a clue on the buffering/problem occurring in the network path between the machines (the switch here). So for a single iperf test executionm you need to look at performance results reported by both ends of the link.
After running some additional tcp iperf tests, it is obvious beaubourg is the outlier.
Measured bandwidth :
It turns out hypervisor3 is not the culprit we thought it was.
Removing T1392 from parent task list.
None of the previous timeout issues are visible anymore on the Proxmox web interface.
They were possibly related to bad network quality on the web browser side (INRIA guest wifi).
Outgoing network traffic from beaubourg to the local private network 192.168.100.0/24 transits via louvre.
Louvre re-emits network packets and sends them to the destination host.
Given than louvre only has 1Gb/s connectivity, this is obviously the source of the <1Gb/s bandwidth limitation.
Actual content of the vmbr0 interface configuration in beaubourg:/etc/network/interfaces:
auto vmbr0 iface vmbr0 inet static bridge_ports vlan440 address 192.168.100.32 netmask 255.255.255.0 up ip route add 192.168.101.0/24 via 192.168.100.1 up ip route add 192.168.200.0/21 via 192.168.100.1 up ip rule add from 192.168.100.32 table private up ip route add default via 192.168.100.1 dev vmbr0 table private up ip route flush cache down ip route del default via 192.168.100.1 dev vmbr0 table private down ip rule del from 192.168.100.32 table private down ip route del 192.168.200.0/21 via 192.168.100.1 down ip route del 192.168.101.0/24 via 192.168.100.1 down ip route flush cache
We can see a private routing table and a special route to louvre -- 192.168.100.1 -- for vmbr0 traffic.
Network limitation removed via a hotfix (manual route deletion).
Some network downtime will be required in the future to ensure the new /etc network configuration works as expected.