Once I assembled a new access server with Accel-ppp and Intel XL710 network adapters, and after putting it into operation, I noticed that all processor cores are used evenly, and the tenth core is used almost 100%, and I also noticed messages in the logs:
BUG: Bad page state: 11 messages suppressed
BUG: Bad page state in process kworker/10:2 pfn:7ffdef
page:ffffe7789fff7bc0 count:-2 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x57ffffc0000000()
raw: 0057ffffc0000000 0000000000000000 0000000000000000 fffffffeffffffff
raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
page dumped because: nonzero _refcount
Modules linked in: ...
CPU: 10 PID: 24193 Comm: kworker/10:2 Tainted: G B OE 4.15.0-134-generic #138-Ubuntu
Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 04/08/2020
Workqueue: mm_percpu_wq vmstat_update
Call Trace:
dump_stack+0x6d/0x8e
bad_page+0xcb/0x120
free_pages_check_bad+0x5f/0x70
free_pcppages_bulk+0x454/0x4f0
drain_zone_pages+0x3d/0x60
refresh_cpu_vm_stats+0x1df/0x2a0
vmstat_update+0x13/0x50
process_one_work+0x1de/0x420
worker_thread+0x32/0x410
kthread+0x121/0x140
? process_one_work+0x420/0x420
? kthread_create_worker_on_cpu+0x70/0x70
ret_from_fork+0x35/0x40
I looked at the driver version for network adapters:
modinfo i40e | grep ver
filename: /lib/modules/4.15.0-134-generic/updates/drivers/net/ethernet/intel/i40e/i40e.ko
version: 2.13.10
description: Intel(R) 40-10 Gigabit Ethernet Connection Network Driver
srcversion: 597EBD96218776AAA546464
vermagic: 4.15.0-134-generic SMP mod_unload
The firmware on the network adapters was set to 8.15.
After I updated the driver and network adapter firmware to version 9.0, this bug disappeared.
I want to note that after updating the driver and firmware, the network may stop working due to the fact that the network interfaces are renamed or because the driver version is incompatible with the firmware version, so you need to update it by connecting via iLO or having physical access to the server.
See my articles for more details:
- Intel i40e driver update
- Intel 700 Series Network Adapters Firmware Update
- Solution of the error NMI watchdog: BUG: soft lockup – CPU#0 stuck for 23s!