-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Description
Describe the bug
I have root as a bcache device, using an ssd (/dev/nvme0n1p4) as my cache device, and md raid 1 over 2 spinning disks (/dev/md127) as the backing device. ext4 is running on the bcache device (/dev/bcache0) which is my root fs.
Booting into 5.15 breaks the btree on the cache device, such that bcache then removes the cache device. On Sunday (14 Nov), I had the bcache device set in writeback mode. The loss of the cache device caused substantial data loss and I was very lucky to only have to reinstall the whole machine, and not lose any personal data. I now run the bcache device in writethrough mode. Nevertheless, I've just tried again, and booting into 5.15.2 caused the cache device to corrupt and then be ejected from the bcache device.
This shows it all going terribly wrong in 5.15.2:
Nov 17 11:23:46 rocket kernel: BUG: unable to handle page fault for address: ffff9a27e1d537f8
Nov 17 11:23:46 rocket kernel: #PF: supervisor write access in kernel mode
Nov 17 11:23:46 rocket kernel: #PF: error_code(0x0003) - permissions violation
Nov 17 11:23:46 rocket kernel: PGD 6fbc01067 P4D 6fbc01067 PUD 100f85063 PMD 12341b063 PTE 8000000121d53161
Nov 17 11:23:46 rocket kernel: Oops: 0003 [#1] SMP NOPTI
Nov 17 11:23:46 rocket kernel: CPU: 3 PID: 1655 Comm: kworker/3:36 Tainted: G O 5.15.2 #1-NixOS
Nov 17 11:23:46 rocket kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B550M Steel Legend, BIOS P2.20 08/02/2021
Nov 17 11:23:46 rocket kernel: Workqueue: bch_btree_io btree_node_write_work [bcache]
Nov 17 11:23:46 rocket kernel: RIP: 0010:__bch_btree_node_write+0x316/0x550 [bcache]
Nov 17 11:23:46 rocket kernel: Code: e2 bc c1 48 8b 30 48 c1 f9 06 48 c1 e1 0c 48 03 0d 8f e2 bc c1 48 01 f9 48 89 31 48 8d 79 08 48 8b b0 f8 0f 00 00 48 83 e7 f8 <48> 89 b1 f8 0f 00 00 48 29 f9 48 89 c6 48 05 00 10 00 00 48 29 ce
Nov 17 11:23:46 rocket kernel: RSP: 0018:ffffb0db837d3e00 EFLAGS: 00010282
Nov 17 11:23:46 rocket kernel: RAX: ffff9a27ccaaf000 RBX: ffff9a27c459dc00 RCX: ffff9a27e1d52800
Nov 17 11:23:46 rocket kernel: RDX: 0000000000001000 RSI: 4f791ab543391ab1 RDI: ffff9a27e1d52808
Nov 17 11:23:46 rocket kernel: RBP: ffff9a27c459dd98 R08: 0000000000000000 R09: 0000000000000001
Nov 17 11:23:46 rocket kernel: R10: 0000000000000800 R11: ffff9a2ebe2ecff0 R12: ffff9a27ccaaf800
Nov 17 11:23:46 rocket kernel: R13: fffff94944875480 R14: ffffb0db837d3e00 R15: ffff9a27ccaaf800
Nov 17 11:23:46 rocket kernel: FS: 0000000000000000(0000) GS:ffff9a2ebe2c0000(0000) knlGS:0000000000000000
Nov 17 11:23:46 rocket kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 17 11:23:46 rocket kernel: CR2: ffff9a27e1d537f8 CR3: 00000006fb410000 CR4: 0000000000750ee0
Nov 17 11:23:46 rocket kernel: PKRU: 55555554
Nov 17 11:23:46 rocket kernel: Call Trace:
Nov 17 11:23:46 rocket kernel: ? __switch_to_asm+0x42/0x70
Nov 17 11:23:46 rocket kernel: ? finish_task_switch.isra.0+0xb0/0x280
Nov 17 11:23:46 rocket kernel: btree_node_write_work+0x43/0x50 [bcache]
Nov 17 11:23:46 rocket kernel: process_one_work+0x1e4/0x380
Nov 17 11:23:46 rocket kernel: worker_thread+0x50/0x410
Nov 17 11:23:46 rocket kernel: ? process_one_work+0x380/0x380
Nov 17 11:23:46 rocket kernel: kthread+0x127/0x150
Nov 17 11:23:46 rocket kernel: ? set_kthread_struct+0x40/0x40
Nov 17 11:23:46 rocket kernel: ret_from_fork+0x22/0x30
Nov 17 11:23:46 rocket kernel: Modules linked in: af_packet cfg80211 rfkill 8021q ip6table_nat iptable_nat nf_nat amdgpu xt_conntrack intel_rapl_msr nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6t_rpfilter ipt_rpfilter wmi_bmof >
Nov 17 11:23:46 rocket kernel: macvlan deflate evdev mac_hid rapl efi_pstore bridge tpm_crb wmi video tpm_tis stp tpm_tis_core llc tpm v4l2loopback(O) tiny_power_button gpio_amdpt rng_core gpio_generic pinctrl_amd videodev acpi_cpufreq b>
Nov 17 11:23:46 rocket kernel: CR2: ffff9a27e1d537f8
Nov 17 11:23:46 rocket kernel: ---[ end trace 8ce2f32661623edb ]---
Nov 17 11:23:46 rocket kernel: RIP: 0010:__bch_btree_node_write+0x316/0x550 [bcache]
Nov 17 11:23:46 rocket kernel: Code: e2 bc c1 48 8b 30 48 c1 f9 06 48 c1 e1 0c 48 03 0d 8f e2 bc c1 48 01 f9 48 89 31 48 8d 79 08 48 8b b0 f8 0f 00 00 48 83 e7 f8 <48> 89 b1 f8 0f 00 00 48 29 f9 48 89 c6 48 05 00 10 00 00 48 29 ce
Nov 17 11:23:46 rocket kernel: RSP: 0018:ffffb0db837d3e00 EFLAGS: 00010282
Nov 17 11:23:46 rocket kernel: RAX: ffff9a27ccaaf000 RBX: ffff9a27c459dc00 RCX: ffff9a27e1d52800
Nov 17 11:23:46 rocket kernel: RDX: 0000000000001000 RSI: 4f791ab543391ab1 RDI: ffff9a27e1d52808
Nov 17 11:23:46 rocket kernel: RBP: ffff9a27c459dd98 R08: 0000000000000000 R09: 0000000000000001
Nov 17 11:23:46 rocket kernel: R10: 0000000000000800 R11: ffff9a2ebe2ecff0 R12: ffff9a27ccaaf800
Nov 17 11:23:46 rocket kernel: R13: fffff94944875480 R14: ffffb0db837d3e00 R15: ffff9a27ccaaf800
Nov 17 11:23:46 rocket kernel: FS: 0000000000000000(0000) GS:ffff9a2ebe2c0000(0000) knlGS:0000000000000000
Nov 17 11:23:46 rocket kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 17 11:23:46 rocket kernel: CR2: ffff9a27e1d537f8 CR3: 0000000106600000 CR4: 0000000000750ee0
Nov 17 11:23:46 rocket kernel: PKRU: 55555554I have switched back to 5.14.18 and everything appears stable.
I have searched around and can't find other reports of anything similar from other distributions, though google is pretty useless these days for attempting to find such things. Seems unlikely to me that it should be nixos specific, but difficult to be sure.
Steps To Reproduce
Steps to reproduce the behavior:
- (Probably not using 5.15) setup a bcache device. Whether or not it is required that the backing device be raid1 and the cache device be ssd over nvme is unknown
- Format it, use it.
- Reboot into 5.15 and attempt to use the device. It looks like a panic occurs, and at that point the btree on the cache device is corrupted and reboots will refuse to attach the cache device to the bcache. The cache device is now unusable and has to be rebuilt, causing loss of all data on it.
Expected behavior
The kernel should not panic!
Notify maintainers
Metadata
# nix-shell -p nix-info --run "nix-info -m"
- system: `"x86_64-linux"`
- host os: `Linux 5.14.18, NixOS, 21.05.4116.46251a79f75 (Okapi)`
- multi-user?: `yes`
- sandbox: `yes`
- version: `nix-env (Nix) 2.3.16`
- channels(root): `"home-manager-21.05, nixos-21.05.4116.46251a79f75, nixos-unstable-21.11pre331460.931ab058daa"`
- channels(matthew): `"home-manager-21.05, nixos-unstable-21.11pre331460.931ab058daa"`
- nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`Maintainer information:
attribute: linuxPackages_5_15