Script to bind Intel NIC interrupts to processor cores

I will give the script code from Intel to bind interrupts of Intel network adapters to processor cores, and also show examples of its use:

Script content:

#!/bin/bash
#
# Copyright (c) 2015, Intel Corporation
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
#     * Redistributions of source code must retain the above copyright notice,
#       this list of conditions and the following disclaimer.
#     * Redistributions in binary form must reproduce the above copyright
#       notice, this list of conditions and the following disclaimer in the
#       documentation and/or other materials provided with the distribution.
#     * Neither the name of Intel Corporation nor the names of its contributors
#       may be used to endorse or promote products derived from this software
#       without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# Affinitize interrupts to cores
#
# typical usage is (as root):
# set_irq_affinity -x local eth1 <eth2> <eth3>
#
# to get help:
# set_irq_affinity

usage()
{
	echo
	echo "Usage: $0 [-x|-X] {all|local|remote|one|custom} [ethX] <[ethY]>"
	echo "	options: -x		Configure XPS as well as smp_affinity"
	echo "	options: -X		Disable XPS but set smp_affinity"
	echo "	options: {remote|one} can be followed by a specific node number"
	echo "	Ex: $0 local eth0"
	echo "	Ex: $0 remote 1 eth0"
	echo "	Ex: $0 custom eth0 eth1"
	echo "	Ex: $0 0-7,16-23 eth0"
	echo
	exit 1
}

usageX()
{
	echo "options -x and -X cannot both be specified, pick one"
	exit 1
}

if [ "$1" == "-x" ]; then
	XPS_ENA=1
	shift
fi

if [ "$1" == "-X" ]; then
	if [ -n "$XPS_ENA" ]; then
		usageX
	fi
	XPS_DIS=2
	shift
fi

if [ "$1" == -x ]; then
	usageX
fi

if [ -n "$XPS_ENA" ] && [ -n "$XPS_DIS" ]; then
	usageX
fi

if [ -z "$XPS_ENA" ]; then
	XPS_ENA=$XPS_DIS
fi

num='^[0-9]+$'
# Vars
AFF=$1
shift

case "$AFF" in
    remote)	[[ $1 =~ $num ]] && rnode=$1 && shift ;;
    one)	[[ $1 =~ $num ]] && cnt=$1 && shift ;;
    all)	;;
    local)	;;
    custom)	;;
    [0-9]*)	;;
    -h|--help)	usage ;;
    "")		usage ;;
    *)		IFACES=$AFF && AFF=all ;;	# Backwards compat mode
esac

# append the interfaces listed to the string with spaces
while [ "$#" -ne "0" ] ; do
	IFACES+=" $1"
	shift
done

# for now the user must specify interfaces
if [ -z "$IFACES" ]; then
	usage
	exit 1
fi

# support functions

set_affinity()
{
	VEC=$core
	if [ $VEC -ge 32 ]
	then
		MASK_FILL=""
		MASK_ZERO="00000000"
		let "IDX = $VEC / 32"
		for ((i=1; i<=$IDX;i++))
		do
			MASK_FILL="${MASK_FILL},${MASK_ZERO}"
		done

		let "VEC -= 32 * $IDX"
		MASK_TMP=$((1<<$VEC))
		MASK=$(printf "%X%s" $MASK_TMP $MASK_FILL)
	else
		MASK_TMP=$((1<<$VEC))
		MASK=$(printf "%X" $MASK_TMP)
	fi

	printf "%s" $MASK > /proc/irq/$IRQ/smp_affinity
	printf "%s %d %s -> /proc/irq/$IRQ/smp_affinity\n" $IFACE $core $MASK
	case "$XPS_ENA" in
	1)
		printf "%s %d %s -> /sys/class/net/%s/queues/tx-%d/xps_cpus\n" $IFACE $core $MASK $IFACE $((n-1))
		printf "%s" $MASK > /sys/class/net/$IFACE/queues/tx-$((n-1))/xps_cpus
	;;
	2)
		MASK=0
		printf "%s %d %s -> /sys/class/net/%s/queues/tx-%d/xps_cpus\n" $IFACE $core $MASK $IFACE $((n-1))
		printf "%s" $MASK > /sys/class/net/$IFACE/queues/tx-$((n-1))/xps_cpus
	;;
	*)
	esac
}

# Allow usage of , or -
#
parse_range () {
        RANGE=${@//,/ }
        RANGE=${RANGE//-/..}
        LIST=""
        for r in $RANGE; do
		# eval lets us use vars in {#..#} range
                [[ $r =~ '..' ]] && r="$(eval echo {$r})"
		LIST+=" $r"
        done
	echo $LIST
}

# Affinitize interrupts
#
setaff()
{
	CORES=$(parse_range $CORES)
	ncores=$(echo $CORES | wc -w)
	n=1

	# this script only supports interrupt vectors in pairs,
	# modification would be required to support a single Tx or Rx queue
	# per interrupt vector

	queues="${IFACE}-.*TxRx"

	irqs=$(grep "$queues" /proc/interrupts | cut -f1 -d:)
	[ -z "$irqs" ] && irqs=$(grep $IFACE /proc/interrupts | cut -f1 -d:)
	[ -z "$irqs" ] && irqs=$(for i in `ls -Ux /sys/class/net/$IFACE/device/msi_irqs` ;\
	                         do grep "$i:.*TxRx" /proc/interrupts | grep -v fdir | cut -f 1 -d : ;\
	                         done)
	[ -z "$irqs" ] && echo "Error: Could not find interrupts for $IFACE"

	echo "IFACE CORE MASK -> FILE"
	echo "======================="
	for IRQ in $irqs; do
		[ "$n" -gt "$ncores" ] && n=1
		j=1
		# much faster than calling cut for each
		for i in $CORES; do
			[ $((j++)) -ge $n ] && break
		done
		core=$i
		set_affinity
		((n++))
	done
}

# now the actual useful bits of code

# these next 2 lines would allow script to auto-determine interfaces
#[ -z "$IFACES" ] && IFACES=$(ls /sys/class/net)
#[ -z "$IFACES" ] && echo "Error: No interfaces up" && exit 1

# echo IFACES is $IFACES

CORES=$(</sys/devices/system/cpu/online)
[ "$CORES" ] || CORES=$(grep ^proc /proc/cpuinfo | cut -f2 -d:)

# Core list for each node from sysfs
node_dir=/sys/devices/system/node
for i in $(ls -d $node_dir/node*); do
	i=${i/*node/}
	corelist[$i]=$(<$node_dir/node${i}/cpulist)
done

for IFACE in $IFACES; do
	# echo $IFACE being modified

	dev_dir=/sys/class/net/$IFACE/device
	[ -e $dev_dir/numa_node ] && node=$(<$dev_dir/numa_node)
	[ "$node" ] && [ "$node" -gt 0 ] || node=0

	case "$AFF" in
	local)
		CORES=${corelist[$node]}
	;;
	remote)
		[ "$rnode" ] || { [ $node -eq 0 ] && rnode=1 || rnode=0; }
		CORES=${corelist[$rnode]}
	;;
	one)
		[ -n "$cnt" ] || cnt=0
		CORES=$cnt
	;;
	all)
		CORES=$CORES
	;;
	custom)
		echo -n "Input cores for $IFACE (ex. 0-7,15-23): "
		read CORES
	;;
	[0-9]*)
		CORES=$AFF
	;;
	*)
		usage
		exit 1
	;;
	esac

	# call the worker function
	setaff
done

# check for irqbalance running
IRQBALANCE_ON=`ps ax | grep -v grep | grep -q irqbalance; echo $?`
if [ "$IRQBALANCE_ON" == "0" ] ; then
	echo " WARNING: irqbalance is running and will"
	echo "          likely override this script's affinitization."
	echo "          Please stop the irqbalance service and/or execute"
	echo "          'killall irqbalance'"
fi

Before executing the script, you must disable the virtual processor cores (Hyper Threading) in the BIOS.

For example, I configured routers on HPE servers with two network adapters that are connected to different Numa nodes through different Riser cards, I tied the interrupts of the first network adapter to the first CPU in numa0, and the interrupts of the second network adapter to the second CPU, which was in numa1 together with the network adapter . That is, the first network adapter was used for Uplink (WAN), and the second for users, I collected similar servers for example for NAT, accel-ppp, etc.

Since irqbalance creates interrupts based on the total number of cores, I changed them to the number on one CPU (for example, there are two Intel XeonGold 6230R with 26 physical cores on each CPU, virtual ones are disabled):

ethtool -L ens1f1 combined 26
ethtool -L ens3f1 combined 26

Now, with a script, we will bind all interrupts of the ens1f1 network interface on the first network adapter, and I made 26 of them with the commands above, we will bind them to the first 26 cores, that is, to the first CPU:

./set_irq_affinity.sh 0-25 ens1f1

Result of script execution:

./set_irq_affinity.sh 0-25 ens1f1
IFACE CORE MASK -> FILE
=======================
ens1f1 0 1 -> /proc/irq/148/smp_affinity
ens1f1 1 2 -> /proc/irq/149/smp_affinity
ens1f1 2 4 -> /proc/irq/150/smp_affinity
ens1f1 3 8 -> /proc/irq/151/smp_affinity
ens1f1 4 10 -> /proc/irq/152/smp_affinity
ens1f1 5 20 -> /proc/irq/153/smp_affinity
ens1f1 6 40 -> /proc/irq/154/smp_affinity
ens1f1 7 80 -> /proc/irq/155/smp_affinity
ens1f1 8 100 -> /proc/irq/156/smp_affinity
ens1f1 9 200 -> /proc/irq/157/smp_affinity
ens1f1 10 400 -> /proc/irq/158/smp_affinity
ens1f1 11 800 -> /proc/irq/159/smp_affinity
ens1f1 12 1000 -> /proc/irq/160/smp_affinity
ens1f1 13 2000 -> /proc/irq/161/smp_affinity
ens1f1 14 4000 -> /proc/irq/162/smp_affinity
ens1f1 15 8000 -> /proc/irq/163/smp_affinity
ens1f1 16 10000 -> /proc/irq/164/smp_affinity
ens1f1 17 20000 -> /proc/irq/165/smp_affinity
ens1f1 18 40000 -> /proc/irq/166/smp_affinity
ens1f1 19 80000 -> /proc/irq/167/smp_affinity
ens1f1 20 100000 -> /proc/irq/168/smp_affinity
ens1f1 21 200000 -> /proc/irq/169/smp_affinity
ens1f1 22 400000 -> /proc/irq/170/smp_affinity
ens1f1 23 800000 -> /proc/irq/171/smp_affinity
ens1f1 24 1000000 -> /proc/irq/172/smp_affinity
ens1f1 25 2000000 -> /proc/irq/173/smp_affinity

Now we will bind network interface interrupts on the second network adapter to 26 cores of the second CPU:

./set_irq_affinity.sh 26-51 ens3f1
IFACE CORE MASK -> FILE
=======================
ens3f1 26 4000000 -> /proc/irq/272/smp_affinity
ens3f1 27 8000000 -> /proc/irq/273/smp_affinity
ens3f1 28 10000000 -> /proc/irq/274/smp_affinity
ens3f1 29 20000000 -> /proc/irq/275/smp_affinity
ens3f1 30 40000000 -> /proc/irq/276/smp_affinity
ens3f1 31 80000000 -> /proc/irq/277/smp_affinity
ens3f1 32 1,00000000 -> /proc/irq/278/smp_affinity
ens3f1 33 2,00000000 -> /proc/irq/279/smp_affinity
ens3f1 34 4,00000000 -> /proc/irq/280/smp_affinity
ens3f1 35 8,00000000 -> /proc/irq/281/smp_affinity
ens3f1 36 10,00000000 -> /proc/irq/282/smp_affinity
ens3f1 37 20,00000000 -> /proc/irq/283/smp_affinity
ens3f1 38 40,00000000 -> /proc/irq/284/smp_affinity
ens3f1 39 80,00000000 -> /proc/irq/285/smp_affinity
ens3f1 40 100,00000000 -> /proc/irq/286/smp_affinity
ens3f1 41 200,00000000 -> /proc/irq/287/smp_affinity
ens3f1 42 400,00000000 -> /proc/irq/288/smp_affinity
ens3f1 43 800,00000000 -> /proc/irq/289/smp_affinity
ens3f1 44 1000,00000000 -> /proc/irq/290/smp_affinity
ens3f1 45 2000,00000000 -> /proc/irq/291/smp_affinity
ens3f1 46 4000,00000000 -> /proc/irq/292/smp_affinity
ens3f1 47 8000,00000000 -> /proc/irq/293/smp_affinity
ens3f1 48 10000,00000000 -> /proc/irq/294/smp_affinity
ens3f1 49 20000,00000000 -> /proc/irq/295/smp_affinity
ens3f1 50 40000,00000000 -> /proc/irq/296/smp_affinity
ens3f1 51 80000,00000000 -> /proc/irq/297/smp_affinity

If irqbalance is running or will be launched after you have distributed interrupts by the script, then it will make changes in its own way, usually it binds each interrupt to all cores of all CPUs at once, that is, each interrupt can be processed by any cores, in this case several interrupts can be processed on one core, which is not very good.
For irqbalance, I often do not disable autorun, but stop it when the operating system starts via rc.local, since it helps to distribute other interrupts, for example, from a raid controller, if you disable irqbalance, then you need to look at the entire list of interrupts and manually distribute them, see my article:
Distribution of network card interrupts across processor cores

An example of disabling irqbalance autostart in Ubuntu:

systemctl is-enabled irqbalance
systemctl disable irqbalance

I add script execution commands to rc.local, see also my article:
Solution: No /etc/rc.local file on Ubuntu 18

For example, I add the following commands:

/sbin/ethtool -G ens1f1 rx 4096 tx 4096
/sbin/ethtool -G ens3f1 rx 4096 tx 4096
/sbin/ethtool -K ens1f1 tso off gro off gso off
/sbin/ethtool -K ens3f1 tso off gro off gso off
service irqbalance stop
/dir/ixnfo.com/set_irq_affinity.sh 0-25 ens1f1
/dir/ixnfo.com/set_irq_affinity.sh 26-51 ens3f1
/sbin/ip link set ens1f1 txqueuelen 10000
/sbin/ip link set ens3f1 txqueuelen 10000

See also my articles:
Taskset – bind process to CPU cores
Preparing a Linux server before installing Accel-ppp

Leave a comment

Leave a Reply