Solution Zabbix housekeeper processes more than 75% busy

Once I found a warning in the Zabbix panel:
Zabbix housekeeper processes more than 75% busy

Housekeeper processes are designed to clean up obsolete data, for example, each data element indicates the period for storing history and the period for storing trends, everything that is greater than these values is cleared by housekeeper processes so that the database does not grow to a gigantic size.

The first thing that can be optimized is the update interval, the less often it is, the less data is entered into the database and the less then you have to clear. I created all the templates myself, if there are existing templates, then they can be completely cloned and changed. For example, for client switches on access, I specified an update interval of 15m, a history storage period of 1d, a trend storage period of 60d. On servers and other network equipment, the kernel naturally made the update interval more frequent as needed and longer the trend storage period.

The Zabbix server configuration has parameters for configuring housekeeper, let’s open the configuration file in a text editor:

nano /etc/zabbix/zabbix_server.conf

The HousekeepingFrequency parameter determines how often the housekeeper processes will clean up (in hours), the default value is 1, which means every hour, the available values are 0-24.

The MaxHousekeeperDelete parameter determines how many values ​​in the database tables will be deleted, the default is 5000, which is very small for a server with a large number of hosts and items, the available values ​​are 0-1000000. If you specify 0, this will remove the limit, but you need to take into account the performance of the disk system and the amount of data that needs to be deleted in the database, so that it does not turn out that the housekeeper processes will start cleaning every hour and perform it for more than an hour. By default, the “Zabbix housekeeper processes more than 75% busy” trigger fires if the cleaning process lasts more than 30 minutes, you can monitor the execution time and load on the disk system on the graphs.

At the time of this writing, this Zabbix server served 754 hosts and had 25609 items, in the configuration I specified the parameters:

HousekeepingFrequency=1
MaxHousekeeperDelete=550000

As you can see from the graphs, cleaning is performed no longer than 15 minutes.
Usually, if the cleanup takes more than 30 minutes, then this is due to the fact that the housekeeper processes are running infrequently (for example, not every hour) or there is not enough disk performance and the database server is not optimized.
If a lot of outdated data has been collected, then you can stop the Zabbix server and clear the data with SQL queries, see examples in my other article, as well as SQL queries for mass changing update intervals in data items.

SQL queries for Zabbix
See my other articles about Zabbix

Leave a comment

Leave a Reply