Monitoring CPU usage in Zabbix

I will give an example of monitoring the use of each processor core using Zabbix.

Suppose on a high-load NAT server the main load from softirq, there is one processor with 8 cores, and a Zabbix agent is installed on the server.
And in order to see whether the network adapter interrupts are evenly distributed across the processor cores, create data items on the Zabbix server, in which we specify:
Type: Zabbix Agent
Type of information: Numeric (floating point)
Unit of measurement: %
And also the key:

system.cpu.util[0,softirq,avg5]

Where 0 is the processor number, softirq is the type of load, avg5 is the average load per 5 minutes. Similarly, we will create data items for other processor cores with keys, and also add them to one graph:

system.cpu.util[1,softirq,avg5]
system.cpu.util[2,softirq,avg5]
system.cpu.util[3,softirq,avg5]
...

Instead of softirq, you can specify idle, nice, user (default for Linux), system (default for Windows), iowait, interrupt, softirq, steal, guest, guest_nice.
And instead of avg5, you can specify: avg1 (average for one minute, by default) or avg15 (average for 15 minutes).

In order not to specify the processor cores manually, you can create a discovery rule:

system.cpu.discovery

And indicate in it a data element, for example:

system.cpu.util[{#CPU.NUMBER},softirq,avg5]

You can also create a trigger to find out when the value is greater than 90:

({ixnfo.com cpu template:system.cpu.util[0,softirq,avg5].last(0)}>90)

Below are examples of data elements that display various information about the CPU, by the way, these data elements are present by default in the “Template OS Linux” template.

Processor load (1 min average per core):

system.cpu.load[percpu,avg1]

Processor load (5 min average per core):

system.cpu.load[percpu,avg5]

Processor load (15 min average per core):

system.cpu.load[percpu,avg15]

Interrupts per second:

system.cpu.intr

Context switches per second:

system.cpu.switches

CPU idle time:

system.cpu.util[,idle]

CPU interrupt time:

system.cpu.util[,interrupt]

CPU iowait time:

system.cpu.util[,iowait]

CPU nice time:

system.cpu.util[,nice]

CPU softirq time:

system.cpu.util[,softirq]

CPU steal time:

system.cpu.util[,steal]

CPU system time:

system.cpu.util[,system]

CPU user time:

system.cpu.util[,user]

See my other articles in the Zabbix category.

Leave a comment

Leave a Reply

Discover more from IT Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading