View Issue Details

IDProjectCategoryView StatusLast Update
0006646fsimx_LinuxNAND-FMDpublic2025-03-07 17:04
ReporterKeller Assigned To 
PrioritylowSeverityminorReproducibilityalways
Status newResolutionopen 
Product Versionfsimx6sx-V2.1 
Target Versionfsimx-next 
Summary0006646: UBIFS complains "Corrupt empty space" and only mounts volume read-only
DescriptionWhen booting the system, UBIFS gives the following error:

UBIFS error (ubi1:0 pid 191): ubifs_scan: corrupt empty space at LEB 945:118777
UBIFS error (ubi1:0 pid 191): ubifs_scanned_corruption: corruption at LEB 945:118777
UBIFS error (ubi1:0 pid 191): ubifs_scanned_corruption: first 8192 bytes from LEB 945:118777
UBIFS error (ubi1:0 pid 191): ubifs_scan: LEB 945 scanning failed
UBIFS error (ubi1:0 pid 191): do_commit: commit failed, error -117
UBIFS warning (ubi1:0 pid 191): ubifs_ro_mode: switched to read-only mode, error -117
Steps To ReproduceThe problem is difficult to reproduce. It took many months to get two boards from a customer that showed the error.

UBIFS seems to expect empty data (all 0xff) in a NAND page that it uses for administration purposes, but finds non-0xff data. As this should never happen, it refuses to continue and mounts the voulme in read-only mode to avoid further damages or even data loss when writing to this inconsistent filesystem. This state requires manual repair by the user, for example by running some filesystem check.
Additional InformationThe problem is difficult to reproduce. It took many months to get two boards from a customer that showed the error.

UBIFS seems to expect empty data (all 0xff) in a NAND page that it uses for administration purposes, but finds non-0xff data. As this should never happen, it refuses to continue and mounts the voulme in read-only mode to avoid further damages or even data loss when writing to this inconsistent filesystem. This state requires manual repair by the user, for example by running some filesystem check.


The problem is in driver gpmi-nand-fus.c. When reading data from a NAND flash page, the error correction is done by the ECC engine of the SoC, called BCH. When done, the BCH engine returns a status code. If there were any bit errors, then they are corrected and the status is the number of corrected errors, or zero if there were none. The resulting payload data is always correct, unless there were so many errors that the ECC could not handle them anymore. In this case the status is "uncorrectable". A third status "erased" is returned if the payload data and ECC consists of 1-bits only. This is the case after a page is erased and completely empty.

In rare cases, it may happen that an empty page also has a few flipped bits, i.e. 0-bits. Such a page would be read as "uncorrectable", even though it is typically no problem to write data to this page nonetheless. For that, the BCH engine has the option to allow for a small number of 0-bits in a page and still return status "erased".

The gpmi-nand-fus.c driver makes use of this option. However it assumed that the read payload data is also corrected by the BCH engine in this case, i.e. that all read bytes are 0xff if the status is "erased", even if there were 0-bits in the page itself. Recent analysis has shown that this assumption was wrong. The resulting payload data still contains the 0-bits if there are any, no correction is done. This is why UBIFS actually can see those 0-bits and fails to handle them correctly.
Forum Link

Relationships

related to 0006647 new UBoot mxs_nand_fus.c: 0-bits in empty pages are not handled correctly 

Activities

There are no notes attached to this issue.