Once I assembled a new access server with Accel-ppp and Intel XL710 network adapters, and after putting it into operation, I noticed that all processor cores are used evenly, and the tenth core is used almost 100%, and I also noticed messages in the logs:
BUG: Bad page state: 11 messages suppressed BUG: Bad page state in process kworker/10:2 pfn:7ffdef page:ffffe7789fff7bc0 count:-2 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x57ffffc0000000() raw: 0057ffffc0000000 0000000000000000 0000000000000000 fffffffeffffffff raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 page dumped because: nonzero _refcount Modules linked in: ... CPU: 10 PID: 24193 Comm: kworker/10:2 Tainted: G B OE 4.15.0-134-generic #138-Ubuntu Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 04/08/2020 Workqueue: mm_percpu_wq vmstat_update Call Trace: dump_stack+0x6d/0x8e bad_page+0xcb/0x120 free_pages_check_bad+0x5f/0x70 free_pcppages_bulk+0x454/0x4f0 drain_zone_pages+0x3d/0x60 refresh_cpu_vm_stats+0x1df/0x2a0 vmstat_update+0x13/0x50 process_one_work+0x1de/0x420 worker_thread+0x32/0x410 kthread+0x121/0x140 ? process_one_work+0x420/0x420 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x35/0x40
I looked at the driver version for network adapters:
modinfo i40e | grep ver filename: /lib/modules/4.15.0-134-generic/updates/drivers/net/ethernet/intel/i40e/i40e.ko version: 2.13.10 description: Intel(R) 40-10 Gigabit Ethernet Connection Network Driver srcversion: 597EBD96218776AAA546464 vermagic: 4.15.0-134-generic SMP mod_unload
The firmware on the network adapters was set to 8.15.
After I updated the driver and network adapter firmware to version 9.0, this bug disappeared.
I want to note that after updating the driver and firmware, the network may stop working due to the fact that the network interfaces are renamed or because the driver version is incompatible with the firmware version, so you need to update it by connecting via iLO or having physical access to the server.
See my articles for more details:
- Intel i40e driver update
- Intel 700 Series Network Adapters Firmware Update
- Solution of the error NMI watchdog: BUG: soft lockup – CPU#0 stuck for 23s!