Page MenuHomeSoftware Heritage

Slow network transfers from beaubourg
Closed, ResolvedPublic


Since hypervisor3 was added to the existing cluster, network timeout errors manifest themselves in the Proxmox web interface.
Error messages include:

  • "communication failure(0)"
  • "Connection error"
  • "Loading"

Related Objects

Event Timeline

ftigeot triaged this task as Normal priority.Jan 11 2019, 4:24 PM
ftigeot created this task.
ftigeot changed the task status from Open to Work in Progress.Jan 14 2019, 1:23 PM

The network interface hardware on hypervisor3 is relatively new:

i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.1.14-k

There are various open bug reports related to the i40e NIC and/or its driver such as this one: , which could means the i40e NIC and/org its driver is subtilely broken.

hypervisor3 is using Linux 4.15.18-9-pve .
The Linux git changelog in drivers/net/ethernet/intel/i40e between Linux 4.15 and 5.0-rc1 contains more than 120 occurrences of the word "fix". Almost none of these commits have been cherry-picked to v4.15.18.

It is hard to point to a particular bug we could be hitting though and we would need to run a new i40e driver version to test this hypothesis.

Corosync warnings also routinely appear in the logs:

Jan 14 11:56:13 hypervisor3 corosync[5622]: notice  [TOTEM ] Retransmit List: 282eb9
Jan 14 11:56:13 hypervisor3 corosync[5622]:  [TOTEM ] Retransmit List: 282eb9
Jan 14 11:56:13 hypervisor3 corosync[5622]:  [TOTEM ] Retransmit List: 282eba

iperf tests show

  • network speed never reaches 1Gbps, even between hosts which have 10Gb/s network interfaces and are connected to the same switches
  • 19% of UDP packets get lost at 1Gb/s (less than 0.5% at 100Mb/s)

[raw results, TCP test]

hypervisor3:/root#iperf -s -p  8042
Server listening on TCP port 8042
TCP window size: 85.3 KByte (default)
[  4] local port 8042 connected with port 36406
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-15.0 sec  1.58 GBytes   901 Mbits/sec
[  5] local port 8042 connected with port 34552
[  5]  0.0-15.0 sec  1.55 GBytes   885 Mbits/sec
[  4] local port 8042 connected with port 36298
[  4]  0.0-15.0 sec  1.64 GBytes   939 Mbits/sec
[  5] local port 8042 connected with port 34792
[  5]  0.0-15.0 sec  1.58 GBytes   900 Mbits/sec

[raw results, UDP test]

hypervisor3:/root#iperf -u -s -p  8042
Server listening on UDP port 8042
Receiving 1470 byte datagrams
UDP buffer size:  208 KByte (default)
[  3] local port 8042 connected with port 43152                                                      
[ ID] Interval       Transfer     Bandwidth        Jitter   Lost/Total Datagrams                                                   
[  3]  0.0-10.0 sec  11.9 MBytes  10.0 Mbits/sec   0.123 ms    0/ 8505 (0%)                                                        
[  4] local port 8042 connected with port 42732                                                      
[  4]  0.0-10.0 sec   119 MBytes   100 Mbits/sec   0.095 ms   35/85035 (0.041%)                                                    
[  3] local port 8042 connected with port 44095                                                      
[  3]  0.0-10.0 sec   964 MBytes   808 Mbits/sec   0.018 ms 162574/850341 (19%)

Another thing worth noting is the vmbr0 interface on which the primary IP address is located, has a mtu of only 1500 bytes.
The network interfaces it is built on have a 9000 bytes mtu.

Both beaubourg and hypervisor3 network interfaces have a 10Gb/s link layer connection.
Aggregated traffic from multiple iperf streams nevertheless never reaches more than ~= 90% of a 1Gb/s transfer speed.

For the previous iperf TCP test and without tuning, we also have:

  • an average transfer speed of 9,388 Mb/s between hypervisor3 and one of the 10G Ceph nodes, ceph-osd1.
  • an average rransfer speed of 8,364 Mb/s between beaubourg and ceph-osd1.

Since all these machines are relied to the same pair of switches and these switches are managed by INRIA DSI-SESI, I have asked for their assistance in this ticket:

Note that iperf tests are half-duplex, so you need to execute them both ways to have the full picture. Ideally you should run 3 test series:

iperf -s<--iperf -c
iperf -c-->iperf -s
iperf -s<==>iperf -d -c

Note also that for a given test execution, the results are not the same on both the machines, giving a clue on the buffering/problem occurring in the network path between the machines (the switch here). So for a single iperf test executionm you need to look at performance results reported by both ends of the link.

ftigeot added a comment.EditedJan 25 2019, 3:56 PM

After running some additional tcp iperf tests, it is obvious beaubourg is the outlier.
Measured bandwidth :

  • from any 10G machine to any 10G machine (except beaubourg): > 9 Gb/s
  • from any 10G machine to beaubourg: > 9Gb/s
  • from beaubourg to ceph-osd1, ceph-osd2 and hypervisor3: 600-800 Mb/s
  • from beaubourg to ceph-mon1: 230 Kb/s

It turns out hypervisor3 is not the culprit we thought it was.
Removing T1392 from parent task list.

None of the previous timeout issues are visible anymore on the Proxmox web interface.
They were possibly related to bad network quality on the web browser side (INRIA guest wifi).

ftigeot renamed this task from Network timeout issues in the Proxmox cluster to Slow network transfers from beaubourg.Feb 5 2019, 5:18 PM

Outgoing network traffic from beaubourg to the local private network transits via louvre.
Louvre re-emits network packets and sends them to the destination host.

Given than louvre only has 1Gb/s connectivity, this is obviously the source of the <1Gb/s bandwidth limitation.

Actual content of the vmbr0 interface configuration in beaubourg:/etc/network/interfaces:

auto vmbr0
iface vmbr0 inet static
        bridge_ports vlan440
        up ip route add via
        up ip route add via
        up ip rule add from table private
        up ip route add default via dev vmbr0 table private
        up ip route flush cache
        down ip route del default via dev vmbr0 table private
        down ip rule del from table private
        down ip route del via
        down ip route del via
        down ip route flush cache

We can see a private routing table and a special route to louvre -- -- for vmbr0 traffic.

Network limitation removed via a hotfix (manual route deletion).
Some network downtime will be required in the future to ensure the new /etc network configuration works as expected.

olasd closed this task as Resolved.Oct 14 2019, 7:06 PM