Uploaded image for project: ' AGL Development'
  1. AGL Development
  2. SPEC-1386

M3+Kingfisher: kernel crash when writing on NVME device

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • Halibut 8.0.1
    • Flounder
    • Kernel/ OS
    • None
    • M3ULCB
      Kingfisher M04
      SSD WDC WDS512G1X0C-00ENX0
      AGL/master, R-Car BSP 3.4.0

      Writing on NVME device partition leads to a severe kernel crash (not recoverable).

      To reproduce:

      • install the SSD M.2 NVME device on the Kingfisher board
      • boot on AGL/master image
      • log as root
      • run the following commands:
      # partition the SSD device and create 1 big partition 
      fdisk /dev/nvme0n1
      # format the partition
      mkfs.ext4 /dev/nvme0n1p1
      # mount the partition
      mount /dev/nvme0n1p1 /mnt
      # do some writes (command may have to be repeated)
      while true; do date; dd if=/dev/zero of=/mnt/bar bs=4M count=500; sync; done
      ...
      

      This sometimes lead to a kernel crash:

      [  193.429021] Unable to handle kernel paging request at virtual address dead000000000108
      [  193.436940] Mem abort info:
      [  193.439729]   Exception class = DABT (current EL), IL = 32 bits
      [  193.445642]   SET = 0, FnV = 0
      [  193.448689]   EA = 0, S1PTW = 0
      [  193.451822] Data abort info:
      [  193.454696]   ISV = 0, ISS = 0x00000044
      [  193.458524]   CM = 0, WnR = 1
      [  193.461484] [dead000000000108] address between user and kernel address ranges
      [  193.468614] Internal error: Oops: 96000044 [#1] PREEMPT SMP
      [  193.474181] Modules linked in: rfcomm bnep crc32_ce crct10dif_ce nvme nvme_core btusb btrtl btbcm btintel pvrsrvk
      m(O) rcar_can can_dev btwilink bluetooth st_drv ecdh_generic rfkill vspm_if(O) vsp2(O) vspm(O) uvcs_drv(O) mmngrbuf(
      O) mmngr(O)
      [  193.495515] CPU: 1 PID: 4255 Comm: dd Tainted: G    B      O    4.14.0-yocto-standard #1
      [  193.503598] Hardware name: Renesas M3ULCB Kingfisher board based on r8a7796 (DT)
      [  193.510987] task: ffff8005e5568e00 task.stack: ffff000022978000
      [  193.516910] PC is at __rmqueue+0x3cc/0x4c8
      [  193.521001] LR is at get_page_from_freelist+0x5e8/0xa20
      [  193.526221] pc : [<ffff00000818e9ac>] lr : [<ffff00000818fef0>] pstate: a00001c5
      [  193.533608] sp : ffff00002297b850
      [  193.536916] x29: ffff00002297b850 x28: ffff8005fff7df90 
      [  193.542224] x27: 0000000000000001 x26: 00007ffa00082040 
      [  193.547531] x25: fffffffffffffef0 x24: ffff8005fff7dfc0 
      [  193.552837] x23: ffff8005fff7df80 x22: 0000000000000001 
      [  193.558144] x21: ffff8005fff7de80 x20: 0000000000000000 
      [  193.563450] x19: 0000000000000010 x18: 0000000000000002 
      [  193.568756] x17: 0000ffff96eab7a8 x16: ffff0000082108d8 
      [  193.574062] x15: 0000000000000000 x14: 0000000000000000 
      [  193.579368] x13: 0000000000000000 x12: 0000000000000000 
      [  193.584674] x11: 0000000000000000 x10: 00000000ffffff80 
      [  193.589981] x9 : ffff7e0000710020 x8 : dead000000000100 
      [  193.595288] x7 : ffff8005fff7e3a0 x6 : dead000000000100 
      [  193.600594] x5 : ffff8005fff7e3d0 x4 : 0000000000000410 
      [  193.605900] x3 : 000000000000000a x2 : 000000000000000a 
      [  193.611206] x1 : ffff8005fff7e3d0 x0 : ffff7e0000710000 
      [  193.616514] Process dd (pid: 4255, stack limit = 0xffff000022978000)
      [  193.622860] Call trace:
      [  193.625301] Exception stack(0xffff00002297b710 to 0xffff00002297b850)
      [  193.631735] b700:                                   ffff7e0000710000 ffff8005fff7e3d0
      [  193.639559] b720: 000000000000000a 000000000000000a 0000000000000410 ffff8005fff7e3d0
      [  193.647381] b740: dead000000000100 ffff8005fff7e3a0 dead000000000100 ffff7e0000710020
      [  193.655203] b760: 00000000ffffff80 0000000000000000 0000000000000000 0000000000000000
      [  193.663026] b780: 0000000000000000 0000000000000000 ffff0000082108d8 0000ffff96eab7a8
      [  193.670848] b7a0: 0000000000000002 0000000000000010 0000000000000000 ffff8005fff7de80
      [  193.678670] b7c0: 0000000000000001 ffff8005fff7df80 ffff8005fff7dfc0 fffffffffffffef0
      [  193.686492] b7e0: 00007ffa00082040 0000000000000001 ffff8005fff7df90 ffff00002297b850
      [  193.694314] b800: ffff00000818fef0 ffff00002297b850 ffff00000818e9ac 00000000a00001c5
      [  193.702136] b820: ffff8005fd850d80 ffff7e00002e5f00 0001000000000000 0000000000000028
      [  193.709957] b840: ffff00002297b850 ffff00000818e9ac
      [  193.714832] [<ffff00000818e9ac>] __rmqueue+0x3cc/0x4c8
      [  193.719965] [<ffff00000818fef0>] get_page_from_freelist+0x5e8/0xa20
      [  193.726227] [<ffff0000081908e8>] __alloc_pages_nodemask+0xd8/0xbf0
      [  193.732403] [<ffff0000081e3c7c>] alloc_pages_current+0x7c/0xe8
      [  193.738230] [<ffff000008186f58>] __page_cache_alloc+0x98/0xb8
      [  193.743970] [<ffff000008187020>] pagecache_get_page+0xa8/0x280
      [  193.749796] [<ffff00000818721c>] grab_cache_page_write_begin+0x24/0x40
      [  193.756318] [<ffff0000082b2ec0>] ext4_da_write_begin+0xb8/0x3b0
      [  193.762230] [<ffff000008186d48>] generic_perform_write+0x90/0x178
      [  193.768317] [<ffff000008189b20>] __generic_file_write_iter+0x100/0x1c8
      [  193.774839] [<ffff0000082a07e4>] ext4_file_write_iter+0x10c/0x408
      [  193.780928] [<ffff000008210444>] __vfs_write+0xac/0x118
      [  193.786146] [<ffff000008210688>] vfs_write+0xa0/0x190
      [  193.791191] [<ffff000008210920>] SyS_write+0x48/0xb0
      [  193.796148] Exception stack(0xffff00002297bec0 to 0xffff00002297c000)
      [  193.802582] bec0: 0000000000000001 0000ffff969ea000 0000000000400000 0000ffff96f3c000
      [  193.810405] bee0: 0000000000400000 0000000000000000 0000ffff969ea000 0000aaaad769fb00
      [  193.818226] bf00: 0000000000000040 0000ffff96f7b260 0000000000010080 0000000000000000
      [  193.826048] bf20: 0000000000000001 000000000000270f 0000000000002010 0000000000000000
      [  193.833870] bf40: 0000aaaad76bae38 0000ffff96eab7a8 0000000000000002 0000000000000001
      [  193.841692] bf60: 0000000000400000 0000aaaad76bb130 0000000000000000 0000ffff969ea000
      [  193.849513] bf80: 0000aaaad76bb000 0000000000000001 0000ffff96f7b280 00000000000000e5
      [  193.857335] bfa0: 0000000000400000 0000ffffe2c71b70 0000aaaad769ff94 0000ffffe2c71b70
      [  193.865157] bfc0: 0000ffff96eab7d0 0000000060000000 0000000000000001 0000000000000040
      [  193.872979] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [  193.880804] [<ffff000008083808>] __sys_trace_return+0x0/0x4
      [  193.886371] Code: f1008120 54000740 a9401526 2a0303e2 (f90004c5) 
      [  193.892462] ---[ end trace 01f5d11d2793b9dd ]---
      [  193.897155] note: dd[4255] exited with preempt_count 1
      

      Sometimes, there are some limits reached:

      [ 1122.421243] nvme nvme0: async event result 00010300
      ...
      [ 1152.989240] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
      [ 1153.254930] nvme nvme0: Shutdown timeout set to 60 seconds
      [ 1153.260451] nvme nvme0: NPSS is invalid; not using APST
      [ 1153.265721] nvme nvme0: min host memory (2105376 MiB) above limit (128 MiB).
      

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            harunobu.kurokawa Harunobu Kurokawa
            sdesneux Stephane Desneux
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: