0

I am on Ubuntu 22.04 LTS, and I have a Western Digital WD Black 500Gb NVME2 ssd. The laptop is a Dell E5495.

I installed a fresh Ubuntu (previously was Windows 11), but I continuosly get the following errors into the system log:

  426.038056] pcieport 0000:00:01.5: AER: Correctable error message received from 0000:04:00.0
[  426.038083] nvme 0000:04:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
[  426.038092] nvme 0000:04:00.0:   device [15b7:5017] error status/mask=00000001/0000e000
[  426.038101] nvme 0000:04:00.0:    [ 0] RxErr                  (First)
[  426.575193] pcieport 0000:00:01.5: AER: Multiple Correctable error message received from 0000:04:00.0
[  426.575220] pcieport 0000:00:01.5: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Transmitter ID)
[  426.575227] pcieport 0000:00:01.5:   device [1022:15d3] error status/mask=00001000/00006000
[  426.575236] pcieport 0000:00:01.5:    [12] Timeout               
[  426.575248] nvme 0000:04:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
[  426.575255] nvme 0000:04:00.0:   device [15b7:5017] error status/mask=00000081/0000e000
[  426.575263] nvme 0000:04:00.0:    [ 0] RxErr                  (First)
[  426.575270] nvme 0000:04:00.0:    [ 7] BadDLLP               
[  426.575276] nvme 0000:04:00.0: AER:   Error of this Agent is reported first

Despite of this the laptop works well.

Could anybody tell me why?

Thank you!

SOLVED: after upgrading the PC BIOS by fwuptd the issue has gone!

$ journalctl -b 0 | grep -i aer

apr 13 00:08:53 Laptop kernel: acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME AER PCIeCapability]
apr 13 00:08:53 Laptop kernel: pcieport 0000:00:01.2: AER: enabled with IRQ 25
apr 13 00:08:53 Laptop kernel: pcieport 0000:00:01.3: AER: enabled with IRQ 26
apr 13 00:08:53 Laptop kernel: pcieport 0000:00:01.4: AER: enabled with IRQ 27
apr 13 00:08:53 Laptop kernel: pcieport 0000:00:01.5: AER: enabled with IRQ 28
apr 13 00:08:53 Laptop kernel: pcieport 0000:00:08.2: AER: enabled with IRQ 30

1 Answer 1

4

PCIe error correction works as intended: correctable errors are automatically corrected so they won't cause incorrect data to be propagated (and so won't require the system to crash, which would otherwise be the only way to prevent the processing of possibly-corrupt data).

You could try adding the kernel boot option pcie_aspm=off, but since it disables PCIe Active State Power Management system-wide, it is less than ideal solution in a laptop. But if it causes the errors to stop, you'll know that the errors are related to PCIe power management and can search for a more targeted solution. You might want to report the issue to the Linux NVMe driver developer(s); they might be able to suggest a more specific solution and add it to the driver so that future kernel versions will be able to handle this case automatically.

The error messages already indicate that the errors are happening in the communication between the WD Black NVMe SSD and the system chipset.

If you can temporarily go back to Windows, you could then install the Western Digital Dashboard utility, and use it to check if your SSD needs a firmware update. That Windows-only utility seems to currently be the only way to install firmware updates to your SSD model.

If your SSD has a power management or other issue, updating its firmware might fix it.

Until recent years, the AER has been mostly a server-grade feature, so it is possible that your laptop is designed to rely on error correction to work normally, given that Windows won't normally report correctable errors. If so, then you might use the pci=noaer kernel boot option instead, to disable the error reporting but allow the hardware to keep correcting any correctable errors as usual. The errors will still keep happening in the background and will keep causing some performance degradation (so it's less than optimal solution), but the option will stop the logging.

The Owner's Manual for your laptop seems to include instructions for replacing the SSD, however since the instructions include opening the bottom of the laptop and removing the internal battery, I would hesitate to recommend it as a do-it-yourself option unless you are already familiar with laptop hardware maintenance.

But there is a chance the errors might be caused by oxidation in the M.2 connector; if that is the case, then just removing & re-seating the SSD might help.

1
  • I checked the connector, it is in a very good state, so that did not solved the issue. I cannot switch back to WinZoZZ, but by an ArchLinux guide I checked the firmwares list, and yes, there is an update, but I cannot get a changelog. Despite of this I don't see any issue or performance degradation. I checked the disk by FSCK, and the file system is good.There are guides and scripts to update manually the fw on linux, but I need to ask you if the upgrade procedure could break the disk (which is a working OS image with all my stuffs)? Thank you again! (I added the "pci=noerr" switch to gurb) Commented Apr 5 at 7:48

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.