Distribution of network card interrupts across processor cores

I will give an example of the distribution of interrupt network interfaces on the processor cores.
For example, I’ll take a server with accel-ppp, 6Gb/s traffic, 500K + pps, nat with ipoe and 6000 dchp clients.
Also, be sure to disable hyper-threading in the BIOS, since load balancing on virtual cores can greatly increase the load on some physical cores.

Let’s see how the interrupts are distributed at the moment (the commands need to be executed from the root user):

cat /proc/interrupts
grep ens2f0 /proc/interrupts
grep ens2f1 /proc/interrupts
watch -n 1 cat /proc/interrupts
watch -n 1 grep ens1f0 /proc/interrupts
watch -n 1 cat /proc/softirqs
ethtool -S eth0

Also, in real time, let’s see how the processor cores are loaded by executing the “top” command and pressing “1” and what loads by typing the command:

perf top

Let’s see if traffic flows interrupt evenly, there should be approximately the same counter values (example for ixgbe):

ethtool -S ens1f0 | grep .x_queue_._packets
ethtool -S ens1f1 | grep .x_queue_._packets

For i40e like this:

ethtool -S ens1f0 | grep x_packets:
ethtool -S ens1f1 | grep x_packets:

Let’s see how many interrupts are possible and active on the network interface:

ethtool -l ens1f1
ethtool -l ens1f1

For example, I had two e5-2680 processors with 8 cores in each server, the two-port HP 562SFP + network card was in numa0 of the first processor and irqbalance distributed 16 interruptions of both network interfaces to 7 cores of the first processor, except for 0 core.

Therefore, I first manually specified 8 interrupts instead of 16:

ethtool -L ens1f0 combined 8
ethtool -L ens1f1 combined 8

Let’s set up RSS. You can assign a network interface interrupt to a specific processor core as follows (where “X” is the interrupt number that appears in the first column /proc/interrupts, and “N” is the processor mask):

echo N > /proc/irq/X/smp_affinity

I looked at the table of interrupt numbers of network interfaces:

cat /proc/interrupts

I looked at how I distributed the load of irqbalance for the first port on the network card (ens1f0):

cat /proc/irq/47/smp_affinity
cat /proc/irq/48/smp_affinity
cat /proc/irq/49/smp_affinity
cat /proc/irq/50/smp_affinity
cat /proc/irq/51/smp_affinity
cat /proc/irq/52/smp_affinity
cat /proc/irq/53/smp_affinity
cat /proc/irq/54/smp_affinity

It was displayed:


How to load the irqbalance load for the second port on the network card (ens1f1):

cat /proc/irq/75/smp_affinity
cat /proc/irq/76/smp_affinity
cat /proc/irq/77/smp_affinity
cat /proc/irq/78/smp_affinity
cat /proc/irq/79/smp_affinity
cat /proc/irq/80/smp_affinity
cat /proc/irq/81/smp_affinity
cat /proc/irq/82/smp_affinity

It was displayed:


You can also see the PCI address of 0000:04:00.0:

ls -1 /sys/devices/*/*/0000:04:00.0/msi_irqs

As we see through RSS, irqbalance in my case distributed the load crookedly, although on zabbix graphs the load was approximately equal, but 0 core was idle.

The processor mask can be determined using bc using the formula:

apt install bc
echo "obase=16; $[2 ** $cpuN]" | bc

For example, calculate for eight cores:

echo "obase=16; $[2** 0]" | bc
echo "obase=16; $[2** 1]" | bc
echo "obase=16; $[2** 2]" | bc
echo "obase=16; $[2** 3]" | bc
echo "obase=16; $[2** 4]" | bc
echo "obase=16; $[2** 5]" | bc
echo "obase=16; $[2** 6]" | bc
echo "obase=16; $[2** 7]" | bc
echo "obase=16; $[2** 8]" | bc
echo "obase=16; $[2** 9]" | bc
echo "obase=16; $[2** 10]" | bc
echo "obase=16; $[2** 11]" | bc
echo "obase=16; $[2** 12]" | bc
echo "obase=16; $[2** 13]" | bc
echo "obase=16; $[2** 14]" | bc
echo "obase=16; $[2** 15]" | bc

I got the result:


Before changes, be sure to stop irqbalance and remove it from autostart at system startup, since it will start returning its values:

systemctl is-enabled irqbalance
systemctl disable irqbalance
service irqbalance status
service irqbalance stop

Accordingly, I executed the commands below to bind 8 interrupts of both network interfaces to 8 processor cores:

echo 1 > /proc/irq/47/smp_affinity
echo 2 > /proc/irq/48/smp_affinity
echo 4 > /proc/irq/49/smp_affinity
echo 8 > /proc/irq/50/smp_affinity
echo 10 > /proc/irq/51/smp_affinity
echo 20 > /proc/irq/52/smp_affinity
echo 40 > /proc/irq/53/smp_affinity
echo 80 > /proc/irq/54/smp_affinity

echo 1 > /proc/irq/75/smp_affinity
echo 2 > /proc/irq/76/smp_affinity
echo 4 > /proc/irq/77/smp_affinity
echo 8 > /proc/irq/78/smp_affinity
echo 10 > /proc/irq/79/smp_affinity
echo 20 > /proc/irq/80/smp_affinity
echo 40 > /proc/irq/81/smp_affinity
echo 80 > /proc/irq/82/smp_affinity

The load on the cores has become a little more even than in the case of irqbalance.

Let’s see how RPS (Receive Packet Steering – software analogue of hardware RSS) is configured, I have displayed 00000000,00000000 by default:

cat /sys/class/net/ens1f0/queues/*/rps_cpus
cat /sys/class/net/ens1f1/queues/*/rps_cpus

I did not enable RPS, but it is very useful when there are fewer interrupts on the network interface than the core processor. For example, so that the queue is processed on any of the 8 cores, you can specify ff:

echo "ff" > /sys/class/net/ens1f0/queues/rx-0/rps_cpus
echo "ff" > /sys/class/net/ens1f0/queues/rx-1/rps_cpus
echo "ff" > /sys/class/net/ens1f0/queues/rx-2/rps_cpus
echo "ff" > /sys/class/net/ens1f0/queues/rx-3/rps_cpus
echo "ff" > /sys/class/net/ens1f0/queues/rx-4/rps_cpus
echo "ff" > /sys/class/net/ens1f0/queues/rx-5/rps_cpus
echo "ff" > /sys/class/net/ens1f0/queues/rx-6/rps_cpus
echo "ff" > /sys/class/net/ens1f0/queues/rx-7/rps_cpus

echo "ff" > /sys/class/net/ens1f1/queues/rx-0/rps_cpus
echo "ff" > /sys/class/net/ens1f1/queues/rx-1/rps_cpus
echo "ff" > /sys/class/net/ens1f1/queues/rx-2/rps_cpus
echo "ff" > /sys/class/net/ens1f1/queues/rx-3/rps_cpus
echo "ff" > /sys/class/net/ens1f1/queues/rx-4/rps_cpus
echo "ff" > /sys/class/net/ens1f1/queues/rx-5/rps_cpus
echo "ff" > /sys/class/net/ens1f1/queues/rx-6/rps_cpus
echo "ff" > /sys/class/net/ens1f1/queues/rx-7/rps_cpus

To indicate on any of the 0-3 cores 0f (if from 4 to 7 the core, then “f0”), for example:

echo "0f" | tee /sys/class/net/eth0/queues/rx-[0-3]/rps_cpus

As you can see on the graph, after RSS interruptions were distributed even more evenly, softirq load on Zabbix graphs went for almost every line for each core.

To prevent changes from being reset after a system restart, add the commands to /etc/rc.local:

/sbin/ethtool -L ens1f0 combined 8
/sbin/ethtool -L ens1f1 combined 8

echo 1 > /proc/irq/47/smp_affinity
echo 2 > /proc/irq/48/smp_affinity
echo 4 > /proc/irq/49/smp_affinity
echo 8 > /proc/irq/50/smp_affinity
echo 10 > /proc/irq/51/smp_affinity
echo 20 > /proc/irq/52/smp_affinity
echo 40 > /proc/irq/53/smp_affinity
echo 80 > /proc/irq/54/smp_affinity

echo 1 > /proc/irq/75/smp_affinity
echo 2 > /proc/irq/76/smp_affinity
echo 4 > /proc/irq/77/smp_affinity
echo 8 > /proc/irq/78/smp_affinity
echo 10 > /proc/irq/79/smp_affinity
echo 20 > /proc/irq/80/smp_affinity
echo 40 > /proc/irq/81/smp_affinity
echo 80 > /proc/irq/82/smp_affinity

See also my articles:
Configuring the Network in Linux
How to find out on which NUMA node network interfaces
How to distinguish physical processor cores from virtual
Monitoring CPU usage in Zabbix
Monitoring PPS (Packets Per Second) in Zabbix

Leave a comment

Leave a Reply