Page MenuHomeSoftware Heritage

Slow network transfers from beaubourg
Started, Work in Progress, NormalPublic

Description

Since hypervisor3 was added to the existing cluster, network timeout errors manifest themselves in the Proxmox web interface.
Error messages include:

  • "communication failure(0)"
  • "Connection error"
  • "Loading"

Related Objects

Event Timeline

ftigeot triaged this task as Normal priority.Jan 11 2019, 4:24 PM
ftigeot created this task.
ftigeot changed the task status from Open to Work in Progress.Jan 14 2019, 1:23 PM

The network interface hardware on hypervisor3 is relatively new:

i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.1.14-k

There are various open bug reports related to the i40e NIC and/or its driver such as this one: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779756 , which could means the i40e NIC and/org its driver is subtilely broken.

hypervisor3 is using Linux 4.15.18-9-pve .
The Linux git changelog in drivers/net/ethernet/intel/i40e between Linux 4.15 and 5.0-rc1 contains more than 120 occurrences of the word "fix". Almost none of these commits have been cherry-picked to v4.15.18.

It is hard to point to a particular bug we could be hitting though and we would need to run a new i40e driver version to test this hypothesis.

Corosync warnings also routinely appear in the logs:

Jan 14 11:56:13 hypervisor3 corosync[5622]: notice  [TOTEM ] Retransmit List: 282eb9
Jan 14 11:56:13 hypervisor3 corosync[5622]:  [TOTEM ] Retransmit List: 282eb9
Jan 14 11:56:13 hypervisor3 corosync[5622]:  [TOTEM ] Retransmit List: 282eba

iperf tests show

  • network speed never reaches 1Gbps, even between hosts which have 10Gb/s network interfaces and are connected to the same switches
  • 19% of UDP packets get lost at 1Gb/s (less than 0.5% at 100Mb/s)

[raw results, TCP test]

hypervisor3:/root#iperf -s -p  8042
------------------------------------------------------------
Server listening on TCP port 8042
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.100.34 port 8042 connected with 192.168.100.1 port 36406
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-15.0 sec  1.58 GBytes   901 Mbits/sec
[  5] local 192.168.100.34 port 8042 connected with 192.168.100.32 port 34552
[  5]  0.0-15.0 sec  1.55 GBytes   885 Mbits/sec
[  4] local 192.168.100.34 port 8042 connected with 192.168.100.9 port 36298
[  4]  0.0-15.0 sec  1.64 GBytes   939 Mbits/sec
[  5] local 192.168.100.34 port 8042 connected with 192.168.100.32 port 34792
[  5]  0.0-15.0 sec  1.58 GBytes   900 Mbits/sec

[raw results, UDP test]

hypervisor3:/root#iperf -u -s -p  8042
------------------------------------------------------------
Server listening on UDP port 8042
Receiving 1470 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 192.168.100.34 port 8042 connected with 192.168.100.32 port 43152                                                      
[ ID] Interval       Transfer     Bandwidth        Jitter   Lost/Total Datagrams                                                   
[  3]  0.0-10.0 sec  11.9 MBytes  10.0 Mbits/sec   0.123 ms    0/ 8505 (0%)                                                        
[  4] local 192.168.100.34 port 8042 connected with 192.168.100.32 port 42732                                                      
[  4]  0.0-10.0 sec   119 MBytes   100 Mbits/sec   0.095 ms   35/85035 (0.041%)                                                    
[  3] local 192.168.100.34 port 8042 connected with 192.168.100.32 port 44095                                                      
[  3]  0.0-10.0 sec   964 MBytes   808 Mbits/sec   0.018 ms 162574/850341 (19%)

Another thing worth noting is the vmbr0 interface on which the primary IP address is located, has a mtu of only 1500 bytes.
The network interfaces it is built on have a 9000 bytes mtu.

Both beaubourg and hypervisor3 network interfaces have a 10Gb/s link layer connection.
Aggregated traffic from multiple iperf streams nevertheless never reaches more than ~= 90% of a 1Gb/s transfer speed.

For the previous iperf TCP test and without tuning, we also have:

  • an average transfer speed of 9,388 Mb/s between hypervisor3 and one of the 10G Ceph nodes, ceph-osd1.
  • an average rransfer speed of 8,364 Mb/s between beaubourg and ceph-osd1.

Since all these machines are relied to the same pair of switches and these switches are managed by INRIA DSI-SESI, I have asked for their assistance in this ticket:
https://support.inria.fr/Ticket/Display.html?id=127011

Note that iperf tests are half-duplex, so you need to execute them both ways to have the full picture. Ideally you should run 3 test series:

machine1trafficmachine2
iperf -s<--iperf -c
iperf -c-->iperf -s
iperf -s<==>iperf -d -c

Note also that for a given test execution, the results are not the same on both the machines, giving a clue on the buffering/problem occurring in the network path between the machines (the switch here). So for a single iperf test executionm you need to look at performance results reported by both ends of the link.

ftigeot added a comment.EditedJan 25 2019, 3:56 PM

After running some additional tcp iperf tests, it is obvious beaubourg is the outlier.
Measured bandwidth :

  • from any 10G machine to any 10G machine (except beaubourg): > 9 Gb/s
  • from any 10G machine to beaubourg: > 9Gb/s
  • from beaubourg to ceph-osd1, ceph-osd2 and hypervisor3: 600-800 Mb/s
  • from beaubourg to ceph-mon1: 230 Kb/s

It turns out hypervisor3 is not the culprit we thought it was.
Removing T1392 from parent task list.

None of the previous timeout issues are visible anymore on the Proxmox web interface.
They were possibly related to bad network quality on the web browser side (INRIA guest wifi).

ftigeot renamed this task from Network timeout issues in the Proxmox cluster to Slow network transfers from beaubourg.Feb 5 2019, 5:18 PM

Outgoing network traffic from beaubourg to the local private network 192.168.100.0/24 transits via louvre.
Louvre re-emits network packets and sends them to the destination host.

Given than louvre only has 1Gb/s connectivity, this is obviously the source of the <1Gb/s bandwidth limitation.

Actual content of the vmbr0 interface configuration in beaubourg:/etc/network/interfaces:

auto vmbr0
iface vmbr0 inet static
        bridge_ports vlan440
        address 192.168.100.32
        netmask 255.255.255.0
        up ip route add 192.168.101.0/24 via 192.168.100.1
        up ip route add 192.168.200.0/21 via 192.168.100.1
        up ip rule add from 192.168.100.32 table private
        up ip route add default via 192.168.100.1 dev vmbr0 table private
        up ip route flush cache
        down ip route del default via 192.168.100.1 dev vmbr0 table private
        down ip rule del from 192.168.100.32 table private
        down ip route del 192.168.200.0/21 via 192.168.100.1
        down ip route del 192.168.101.0/24 via 192.168.100.1
        down ip route flush cache

We can see a private routing table and a special route to louvre -- 192.168.100.1 -- for vmbr0 traffic.

Network limitation removed via a hotfix (manual route deletion).
Some network downtime will be required in the future to ensure the new /etc network configuration works as expected.