Solution BUG: Bad page state in process kworker

Once I assembled a new access server with Accel-ppp and Intel XL710 network adapters, and after putting it into operation, I noticed that all processor cores are used evenly, and the tenth core is used almost 100%, and I also noticed messages in the logs:

BUG: Bad page state: 11 messages suppressed
BUG: Bad page state in process kworker/10:2  pfn:7ffdef
page:ffffe7789fff7bc0 count:-2 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x57ffffc0000000()
raw: 0057ffffc0000000 0000000000000000 0000000000000000 fffffffeffffffff
raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
page dumped because: nonzero _refcount
Modules linked in: ...
CPU: 10 PID: 24193 Comm: kworker/10:2 Tainted: G    B      OE    4.15.0-134-generic #138-Ubuntu
Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 04/08/2020
Workqueue: mm_percpu_wq vmstat_update
Call Trace:
? process_one_work+0x420/0x420
? kthread_create_worker_on_cpu+0x70/0x70

I looked at the driver version for network adapters:

modinfo i40e | grep ver
filename:       /lib/modules/4.15.0-134-generic/updates/drivers/net/ethernet/intel/i40e/i40e.ko
version:        2.13.10
description:    Intel(R) 40-10 Gigabit Ethernet Connection Network Driver
srcversion:     597EBD96218776AAA546464
vermagic:       4.15.0-134-generic SMP mod_unload

The firmware on the network adapters was set to 8.15.

After I updated the driver and network adapter firmware to version 9.0, this bug disappeared.
I want to note that after updating the driver and firmware, the network may stop working due to the fact that the network interfaces are renamed or because the driver version is incompatible with the firmware version, so you need to update it by connecting via iLO or having physical access to the server.

See my articles for more details:

Leave a comment

Leave a Reply