3

I am trying to ping between two PCs running Ubuntu 20.04 connected via a Layer-2 switch. My goal is to get "nearly" stable ping latency. Similar to other people, I figured out that ping latency (round-trip-time) under a low CPU load gets worse than under a higher CPU load.

https://superuser.com/questions/543503/ping-vs-cpu-usage

https://superuser.com/questions/1189531/kvm-how-is-cpu-usage-related-to-ping

For instance, let say I am pinging from PC-A to PC-B. When PC-B did not run any program other than default programs in the operating system, the ping latency received on PC-A was around 0.5 - 0.6 ms. However, when I used the stress tool to increase CPU load (e.g., stress one core to 80% load), I saw that the ping latency received on PC-A was just around 0.2 - 0.3 ms.

I am sure the problem is not caused by the switch because I also tried the direct cable between PC-A and PC-B and still got the same behavior. I am pretty sure that this behavior is caused by powersave mode, which is the default CPU indicator. But what surprises me is that when I changed CPU indicator from powersave mode to performance mode using cpufrequtils, the ping latency received on PC-A was still around 0.5 - 0.6 ms (no load on PC-B). The same behavior (the ping latency around 0.2 - 0.3 ms) happened when I increased CPU load. In addition to changing the CPU indicator, I guess I still need to do something else.

0

2 Answers 2

1

Under low load, if you have energy efficient Ethernet (EEE) enabled, the network hardware will be put to sleep. This can cause delays. The kernel dynamically guesses when to nap based network usage.

You can disable this feature by following this answer.

0

For instance, let say I am pinging from PC-A to PC-B. When PC-B did not run any program other than default programs in the operating system, the ping latency received on PC-A was around 0.5 - 0.6 ms.

It's likely measuring some time to receive and reply to a packet, and more the time that your CPU takes to wake from idle(for me, this is usually PLL start+lock time) and process an IRQ. Try booting with cpuidle.off=1 to conduct your measurements.

My goal is to get "nearly" stable ping latency.

If this is on the only goal and not an XY problem, here are some things your might play with and see how the impact latency...

  1. Disable idling as discussed above by booting with cpuidle.off=1. In this way, we won't ever have to wait for a core to come out of any light sleep modes.
  2. isolate a cpu at bootime by booting with isolcpus=$idOfCoreToIsolate. Once done, very little will run on the isolated core by default(perhaps timer IRQs). This may not be the boot core(often 0 or the last core).
  3. Identify your NIC IRQ number by looking through /proc/interrupts
  4. If using irqbalance: edit /etc/default/irqbalance to add the isolated core's corresponding bit to IRQBALANCE_BANNED_CPUS and add --banirq=$IRQNUM to IRQBALANCE_ARGS and restart it with sudo /etc/init.d/irqbalance restart
  5. Set your NIC IRQ to the isolated core by running echo $ISOLATEDCORENUM | sudo tee /proc/irq/$IRQNUM/smp_affinity_list. Alternatively if your NIC has flow steering this may require a different configuration.
  6. Disable RX interrupt aggregation latency: sudo ethtool -C <nicNameHere> rx-usecs 0. This way when the ICMP packet is received it will trigger an interrupt immediately(which would be inefficient if we had a lot of packets and weren't optimizing the system only for this task).

In this way, the incoming NIC packet receive IRQ will run as soon as the NIC hears it, since there's no interrupt aggregation and few other IRQs running on the dedicated core, and because the isolated core never sleeps.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.