View Issue Details

IDProjectCategoryView StatusLast Update
0003216NBootiMX6NBootpublic2017-04-28 11:10
ReporterKeller Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status newResolutionopen 
Product VersionV32 
Target VersionV32 
Summary0003216: NBoot erases bootloader in case of uncorrectable NAND errors
DescriptionIf NBoot encounters an uncorrectable error when reading the pages of a block, for example of U-Boot, it assumes it was interrupted earlier while refreshing a block with high ECC error count. Therefore it reads the swap block and without doing any further checks on the block number, it replaces the data of the uncorrectable block with the data of the swap block.

If this assumption is wrong and the block is really not readable, then the swap block is empty. By doing the replacement, NBoot actually erases the uncorrectable data, making any rescue of the data impossible.
Steps To ReproduceUse flash with 224 Bytes spare area. In this case NBoot and U-Boot unfortunately compute the ECC position in the spare area differently, so data that U-Boot has written can not be read by NBoot. This triggers the above scenario.

Start U-Boot. Load any U-Boot image and write it to the U-Boot area:

  tftp uboot.nb0
  nand erase.part UBoot
  nand write $loadaddr UBoot $filesize

Now reboot the board. NBoot tries to load U-Boot, but can not read the data because of the wrong ECC position. From NBoot's point of view, all U-Boot blocks have uncorrectable errors.

According to the strategy lined out above, NBoot now replaces the content of each of these blocks with the (empty) content of the swap block, which actually erases a more or less functional U-Boot image.

Block 10 recovered from swap block 9
Block 11 recovered from swap block 9
Block 12 recovered from swap block 9
Block 13 recovered from swap block 9
Additional InformationWhen a block refresh is in progress, the block number of the original block is (or better should be) stored in the swap block. NBoot should only replace the uncorrectable block, if there is a valid block number stored in the swap block and if this block number equals the number of the block in question.

In fact the strategy in NBoot should go the other way around. When starting, it should immediately read the swap block. If it has a valid block number, then we were interrupted earlier while doing a block refresh. However this could also have been EBoot or U-Boot doing a block refresh while reading the kernel image or similar.

In this case, we can do one of the following:

1. Copy back the swap block only if the original block number of the swap block is in the NBoot/EBoot/U-Boot area. Then erase the swap block. For any other block number, EBoot and U-Boot have to take care of themselves. Which means they have to do a similar check of the swap block first before doing any other work. This means the swap block is read very often (at least twice: once in Nboot, once in Eboot/U-Boot, probably even again in Windows/Linux), with the risk of bit flips because of read disturbs.

2. Copy back the swap block in any case. Then erase the swap block. This also finishes a block refresh of U-Boot or EBoot (or even Linux/Windows) that was interrupted earlier. Which means all the other programs do not have to take care of this situation anymore and do not need to read the swap block anymore. This is more efficient, but it will only work if the data format of NBoot and all the other programs is exactly the same. Which is not the case at the moment (difference between NBoot and U-Boot/Linux).

After doing this, the swap block either contains data that NBoot is not responsible for (case 1) or the swap block is empty (case 2). Therefore whatever block NBoot is reading later, it is not a block that has been saved in the swap block. Which means from now on every uncorrectable error in NBoot is actually this, an uncorrectable error.

Activities

There are no notes attached to this issue.