I’m upgrading my main Proxmox home server from a single SSD with ZFS to two SSDs mirrored in ZFS. Some discussion here. The simple thing is just to reinstall Proxmox new but I’m going to try to do it in-place first. Either way requires careful backups.
The docs on changing a failed boot device are the in-place process I’m following. More or less. It boils down to “set up the new disk in ZFS, then run proxmox-boot-tool to make it bootable”. The blessing process is complicated. Booting with ZFS is always tricky. And Proxmox is rumored to have a special thing where it installs boot loaders on all mirror devices for redundancy. I want that with my new drive! This post is me learning about how it all works.
Summary: this worked very well and is simpler than imagined. Also Proxmox is remarkably well documented. Hurray!
How is Proxmox booting now?
A year ago Proxmox installed something that made it boot from ZFS. Works great. What’s it doing? The challenge is Proxmox has three different ways of booting and I’m not sure which mine is. These docs on using the boot tool are helpful. Some other useful docs here. I should just plug in a monitor and look at the boot process but no, I’m stubborn. The docs say proxmox-boot-tool status will show you the current boot method but the output is not illuminating.
System currently booted with uefi
4B96-7CDC is configured with: uefi (versions: 6.5.13-6-pve, 6.8.12-1-pve, 6.8.12-2-pve)
Yes, it’s UEFI, I knew that. But is that GRUB? Something else? Looking around for advice in docs and from Phind I got some other ideas to try, looking at the EFI boot options:
# efibootmgr -v
BootCurrent: 0002
Timeout: 1 seconds
BootOrder: 0002,0003,0001,0004,0005
Boot0001* UEFI:CD/DVD Drive BBS(129,,0x0)
Boot0002* Linux Boot Manager HD(2,GPT,55f108de-abfa-4dad-b478-641a6d640390,0x800,0x200000)/File(\EFI\SYSTEMD\SYSTEMD-BOOTX64.EFI)
Boot0003* UEFI OS HD(2,GPT,55f108de-abfa-4dad-b478-641a6d640390,0x800,0x200000)/File(\EFI\BOOT\BOOTX64.EFI)..BO
Boot0004* UEFI:Removable Device BBS(130,,0x0)
Boot0005* UEFI:Network Device BBS(131,,0x0)
So that says it’s “Linux Boot Manager”, which is systemd (why doesn’t it say so?). Other docs I read suggest ZFS systems boot from a whole Linux system installed in the ESP. That may involve systemd too. In any case it probably does not involve GRUB. I finally hooked up a monitor and rebooted and saw a menu but it didn’t quite look like GRUB. I did see a message go by about EFI Boot Stub.
Bottom line: I think I’m booting with Linux Boot Manager which involves systemd and a Linux kernel on the ESP partition. Not GRUB.
It’s possible to inspect the boot stuff. It’s on partition 2, the ESP partition in a VFAT filesystem. It has stuff in it like EFI/BOOT/BOOTX64.EFI, EFI/proxmox, and loader/entries.
Identifying the disks
/dev/nvme1n1 is my existing Proxmox boot drive, a zpool with just the one disk in it. I want to add /dev/nvme0n1 to the pool and make it a mirror. But there’s many ways to name a device on Linux and you want a stable name. (Fun fact: the drives on 0 and 1 have swapped between reboots without me changing the hardware.) What’s best?
zpool status -v rpool tells me the status of my existing one disk pool.
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
nvme-eui.002538b22140b542-part3 ONLINE 0 0 0
What is that EUI string? That’s another name for /dev/nvme1n1p3, the “Extended Unique Identifier”. A 64 bit ID, different from the UUIDs, serial numbers, and other identifiers I’ve used before.
I first found this same name with this query: udevadm info --query=all --name=/dev/nvme1n1. But I don’t really know what that tool is. Simpler is ls -l /dev/disk/by-id/ | grep nvme, which shows that nvme-eui.002538b22140b542-part3 is (currently) a symlink to nvme1n1p3. That I understand. So I looked for what linked to nvme0n1 and found that nvme-eui.002538bc3140272b-part3 is the name for the new SSD’s partition 3.
Adding the ZFS mirror disk
OK, time to do surgery on my machine. I am more or less following this guide.
Step 1: clone the partition table to the new disk. sgdisk /dev/nvme1n1 -R /dev/nvme0n1. After doing this the new disk has the same layout as the old one. blkid will show the partitions. (Although confusingly blkid doesn’t show filesystem types: fdisk showed they were labelled as VFAT etc, just like the original disk.) However the UUIDs of each partition are the same too which is not good, we’ll change that soon.
Step 2: give partitions unique UUIDs. sgdisk -G /dev/nvme0n1. Partitions are the same but now they have new IDs.
Step 3: attach the new SSD to the pool. zpool attach rpool nvme-eui.002538b22140b542-part3 $nvme-eui.002538bc3140272b-part3. Why attach and not add? (Edit: add expands storage, attach mirrors it.) Not sure but that’s what Phind told me to do. This pauses a couple of seconds and then returns. zpool status -v rpool shows what’s going on. The ZFS docs suggest that “attach” is a special thing that only directly mirrors devices. I would have guessed zpool add would have been the command to use but that seems to be a more general purpose thing for other configurations with vdevs and maybe RAID-Z.
pool: rpool
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Oct 27 19:49:41 2024
114G / 114G scanned, 2.24G / 114G issued at 2.24G/s
2.31G resilvered, 1.97% done, 00:00:49 to go
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme-eui.002538b22140b542-part3 ONLINE 0 0 0
nvme-eui.002538bc3140272b-part3 ONLINE 0 0 0 (resilvering)
Just a bit later it reports resilvered 116G in 00:01:02 with 0 errors on Sun Oct 27 19:50:43 2024. Nice! These SSDs are fast.
At this point the ZFS stuff is all done. The system reboots fine. I did a zpool scrub for good measure and it reported no errors. I’m now getting most of the benefits of ZFS mirroring for data integrity. However there’s still only one boot device: if that fails, I’ll have to recover with some other boot media.
Installing boot stuff on the new disk
Proxmox has extra stuff for making boot more durable. proxmox-boot-tool manages it. See docs here.
Step 1: I verify that the new disk doesn’t have a boot ESP partition. Device names have changed on me after I rebooted, so /dev/nvme0n1p2 is the original bootable partition. /dev/nvme1n1p2 is the new one.
# old=/dev/nvme0n1p2
# new=/dev/nvme1n1p2
# mount $old m
# ls m
EFI loader
# umount m
# mount $new m
mount: /tmp/m: wrong fs type, bad option, bad superblock on /dev/nvme1n1p2, missing codepage or helper program, or other error.
dmesg(1) may have more information after failed mount system call.
Step 2: format the ESP on the new disk. proxmox-boot-tool format $new. After this there’s a brand new VFAT filesystem on the partition, it is empty.
Step 3: get Proxmox to install its boot process on the new disk as well as the old. proxmox-boot-tool init $new. This copies all the kernels, etc to the new disk and enrolls it so updates should be written to it. After running this both disks have nearly identical boot partitions.
Step 4 (option): refresh. proxmox-boot-tool refresh. Just a sort of verification thing, shouldn’t be necessary, but it writes the kernels again to all enrolled disks.
Rebooting
It works! On reboot it just loads like nothing’s changed.
In the BIOS I now see more EFI boot options. Linux Boot Manager is now on both disks, other entries are also doubled. The EFI boot menu is totally rewritten and has two extra entries.
# efibootmgr -v
BootCurrent: 0000
Timeout: 1 seconds
BootOrder: 0000,0002,0003,0004,0001,0005,0006
Boot0000* Linux Boot Manager HD(2,GPT,63a5e06a-3206-4a55-957c-0878a2275a8c,0x800,0x200000)/File(\EFI\SYSTEMD\SYSTEMD-BOOTX64.EFI)
Boot0001* UEFI:CD/DVD Drive BBS(129,,0x0)
Boot0002* Linux Boot Manager HD(2,GPT,55f108de-abfa-4dad-b478-641a6d640390,0x800,0x200000)/File(\EFI\SYSTEMD\SYSTEMD-BOOTX64.EFI)
Boot0003* UEFI OS HD(2,GPT,55f108de-abfa-4dad-b478-641a6d640390,0x800,0x200000)/File(\EFI\BOOT\BOOTX64.EFI)..BO
Boot0004* UEFI OS HD(2,GPT,63a5e06a-3206-4a55-957c-0878a2275a8c,0x800,0x200000)/File(\EFI\BOOT\BOOTX64.EFI)..BO
Boot0005* UEFI:Removable Device BBS(130,,0x0)
Boot0006* UEFI:Network Device BBS(131,,0x0)
The real question is what happens if the boot disk fails. Will the system still boot? In theory yes, if you have ZFS mirrors and Proxmox’ magic boot setup. I don’t want to pull the M.2 drive in my machine to test it so I don’t know if it works. There’s two genuine looking boot partitions both listed in the EFI boot menus so I think odds are good.