• Setting up Gotify with Proxmox

    Continuing my search for some sort of notifications infrastructure, I set up Gotify to use with Proxmox and other services like BackRest so I get messages in a central place from my servers when things go wrong. I don’t love Gotify but it’s what works with Proxmox so I’ll start there.

    The goal here is to get https://example.com/gotify/ working both for the Web GUI and as an endpoint for Gotify clients including a message destination and the mobile app. I used to use Tailscale but it’s nice to have the Gotify server on a publicly accessible address. It looks designed to be secure for that.

    Step 1: install Gotify as an LXC container in Proxmox with the tteck script. This ships with no config file but the defaults are reasonable (you configure passwords and stuff in the GUI). Note that means no SSL support. Gotify actually has reasonable SSL including Lets Encrypt integration but I didn’t use it.

    Step 2: use Caddy as a reverse proxy to forward to Gotify. Not necessary but useful for my setup where I have a public Caddy server brokering access to other stuff. The Caddy config is a bit fiddly but is well documented here. Caddy has to strip the URL prefix so Gotify doesn’t see it.

    Step 3: configure a new application in Gotify and make note of the token.

    Step 4: in Proxmox go to Datacenter > Notifications and add a new target. Paste in your token. The URL will look something like https://example.com/gotify. Note the lack of trailing slash: if you leave it in both Proxmox and Gotify are dumb about it and you get a confusing error. Press “Test” to verify the messages get through.

    While I’m here… looking for a good solution to get notifications in Gotify from emails. There’s a zillion SMTP-to-Gotify gateways out there that obviate the need to have actual working mail delivery. This one looked promising on last review, simple and updated recently. I haven’t tried it. It doesn’t seem to have any sort of authentication so a bit worried it’ll collect spam if you run it on a public address.

    Update: a quick test shows smtp-gotify works fine for email. I built it as a Go program and then just ran the binary, configured via a few environment variables. The Dockerfile in the project helped figure it out. The simple way to use it is set it as you SMTP host to send mail to. Which I’ve forgotten how to do simply in random Linux machines, need to read up on it.

  • Persistent connections failing: tn3270 and TradeStation

    I have a weird networking problem with persistent sessions that I think may be related to Starlink but I’m not sure. Informed troubleshooting advice welcome.

    My partner uses a Windows machine running a couple of particular network apps. One is the TradeStation app, a commercial program that gets constant updates of stock prices, many times a second. Another is Host on Demand, a clunky Java app that provides tn3270 terminal sessions. It’s a little like ssh or telnet but its own thing.

    The problem is both applications mostly work but their connections break many times a day. They reconnect immediately but it’s manual and disruptive. The TradeStation connection should be long lived and with frequent traffic. Host on Demand sessions are mostly idle although we have configured keep-alives.

    FWIW my own computers don’t see any problems, but I don’t use that software. I run long-lived ssh connections all the time and they’re fine. I just tested with netcat and can keep a TCP connection open for over twenty hours to a server sending a short message every 15 seconds. So I don’t have a general problem with persistent connections.

    Here’s the kicker: the disconnects only happen in the house with Starlink. He also uses the same software in a place with a fiber ISP that has no problems. Both places have a Ubiquiti router I configured, nothing unusual. Sometimes he uses a VPN with Starlink and still sees disconnects but fewer of them. This makes me think it’s something special about Starlink that’s causing the issue.

    My first thought was our IP address changing on Starlink but that only happens every couple of days, not many times a day. A second thought is maybe whatever session state Starlink’s CGNAT is keeping is expiring? But my ssh sessions are fine. Maybe Starlink has something special to protect ssh but not other long lived sessions? Seems unlikely.

    Another possible theory is he’s connected via WiFi in the problem house. The WiFi is quite reliable, Unifi characterizes it as 100% excellent and it’s less than 1 packet lost in 10,000 when pinging the LAN. But I can’t fully rule it out.

    Welcome any informed ideas on how to figure out what’s going wrong. I’m not certain it’s actually Starlink to blame here. I wish I knew more about the apps that are failing and why they may be different from my ssh or netcat. I’ve tried using a packet sniffer but didn’t see anything clearly wrong.

    Reddit discussion

    I posted this on Reddit and got almost no engagement. One reply that suggested some TN3270 alternatives and one good idea, which is to run the netcat client on the Windows machine and see if it works.

  • backrest GUI for restic

    I’ve been using Restic for backups for most of a year now. It’s terrific. But Restic is a bit of a low level tool, using it requires some fiddly config files and setting up cron jobs, etc. There are a bunch of frontends to Restic that turn it more into a user product. I’ve been using resticprofile but don’t love it, it’s confusing and awkward.

    Backrest is new bae. Turns out I want a web GUI for this kind of thing, who knew? It has a very nice and simple structure for configuring Restic. You create repositories, then you create plans. There’s good GUIs for configuring things like retention policies, cron schedules, etc. It runs persistently to provide the web GUI and takes care of launching restic as needed and making things happen.

    Remarkable number of grace notes too. I love that every configuration screen has an option in the GUI to show it as a JSON file, making it easy to keep track of what’s actually happen. Also it has remarkably good live web updates. Not just showing you a job is running but specific status output updated once a second to see, say, how many files of the backup have been completed so far.

    It’s just a really nicely developed product.

  • Using a simple USB disk for Proxmox backups

    Trying a delicate thing here. I have a USB disk I want to write backups too. Mostly I want to back up precious data from a Proxmox guest system (an OpenMediaVault VM with user files, backups written to it with rsync and restic). I also want Proxmox to write backups of guest systems to it. There’s a bunch of ways to do this.

    I decided to complicate it by having a constraint of simplicity. It’s important to me that the USB disk with the backups be as simple and ordinary as is realistic. That means no ZFS, no QEMU virtual disks, no weird partitioning. I want the USB disk to be as simple as possible so someone could plug it in to Linux box and get the files off it.

    The solution I landed at was to use GPT partitions and ext4 filesystems. There’s a 1TB partition for Proxmox, it will connect it to storage as type “directory” and write backups to it. And there’s a 10TB partition for the OMV VM to write backups to with software running in the VM. It seems to work!

    Preparing the disk partitions

    This is all generic Linux stuff. I used gdisk on the PVE host to partition the disk. I just used fixed partitions: I didn’t use LVM or something to make the partitions more flexible. I also didn’t use any Proxmox tools to set up the storage because they all seemed complicated or unsuitable.

    1TB Proxmox backup partition

    This part’s also pretty generic Linux. I made a filesystem with mkfs.ext4 /dev/sdd1. Then I edited /etc/fstab on the PVE host to mount it:

    UUID=6ad88c2a-3a59-4d8f-a887-344e69c6711d /mnt/12tb-backup-usb ext4 defaults,nofail,x-systemd.device-timeout=15s 0 2

    I don’t love this part, I’d rather not customize Proxmox config files. But this seems unavoidable, Proxmox doesn’t have a managed GUI for “just mount this partition for me”. The fstab works fine, I’ve done something similar on another server for most of a year.

    Once I mounted the new filesystem I used the Proxmox GUI to add a new Storage named “usbbackup” of type “Directory” and pointed it to /mnt/12tb-backup-usb. Now Proxmox’ backup systems will write backups to it. Yay!

    10TB OMV partition

    This part is trickier. Conceptually it’s not hard: I just pass through the disk partition to the VM for it to use. But it’s a little weird.

    First thing is finding the partition. I don’t want to name it /dev/sdd2 on the Proxmox host because that could change. I finally figured out it has a more stable name in /dev/disk/by-partuuid/. Not /by-uuid/ which you normally use, that’s for filesystems and we don’t even have one yet, we want the partition.

    There’s no way in the Proxmox GUI to pass through a partition to a VM guest. It’s got a great GUI for creating virtual disks or passing hardware devices. Fortunately there’s docs on how to do it with the command line. The command I used is:

    qm set 101 -scsi2 /dev/disk/by
    -partuuid/14db8691-9229-407f-bb55-a693c1a92f70,iothread=1,backup=0

    Note I added options for iothread=1,backup=0. That matches the other virtual disks that Proxmox created for me, and I think those are good choices. (More efficient I/O and don’t include the disk in Proxmox backups.) Here’s what my virtual drives look like in the GUI afterwards. The other two disks are QEMU images Proxmox manages for me.

    With this done the 10TB partition /dev/sdd2 that Proxmox has now looks like /dev/sdc in the OMV VM. Note there’s no partition number, as far as the VM knows that device is a whole disk and not just a single partition. But I don’t want to make a partition table in that /dev/sdc. I think you can nest GPT partitions but that complicates things.

    So instead I just ran mkfs.ext4 /dev/sdc inside the VM to turn the whole (virtual) disk into a filesystem. Then it shows up in /dev/disk/by-uuid and OMV can also mount it and use it via the GUI. I probably could have used OMV to create the filesystem, for that matter.

    As an extra test I also shut down the OMV VM and then mounted the partition temporarily on the Proxmox host. Just to verify I could read the files. A real test would be to take the drive to a different computer entirely but that would require me driving somewhere to touch it.

    I love how flexible all these systems are. Also appreciate being able to mix my old school sysadmin knowledge with some modern virtual systems stuff. It’s all quite productive.

  • Moving $HOME from one WSL to another with local rsync

    I’m installing a new WSL system, essentially a Linux VM. I want to move my 13GB of home directory files over. WSL doesn’t provide any easy way to do this. You can export and import a whole virtual disk. You can tar up your home directory and copy it just like you would any other Linux box. And you can use WSL’s various strange mounts (including 9P) but those will be slow and screw up persmissions.

    The way I settled on was running an rsync daemon from the command line.

    The client send command in the source WSL instance is the easy part. The only strange thing here is “localhost” as the destination. WSL’s networking magic means that two separate Linux instances apparently share “localhost”. They probably share it with the Windows host, too. (The ::xfer in the destination looks magic too; see below for the server config file that defines it.)

    rsync -az --info=progress2 --port=12345 ~ nelson@localhost::xfer

    The server side to receive things is trickier. Launching the server is easy enough. I just ran this as root, was running into problems with changing group ownership as a normal user

    rsync --daemon --config=rsyncd.conf --no-detach --address=localhost --port=12345

    All the real work here is in rsyncd.conf. Here it is:

    pid file = /tmp/rsyncd.pid
    lock file = /tmp/rsync.lock
    log file = /tmp/rsync.log
    
    [xfer]
    path = /home/nelson/xfer
    comment = xfer
    read only = false
    list = true
    uid = nelson
    gid = nelson
    auth users = nelson
    secrets file = /home/nelson/secrets.txt

    I didn’t check this too closely, an AI generated it for me. It seems to work. It’s possible all my files got chowned and chgrped to nelson but in this case (my home directory) that’s ok.

    The one gotcha is secrets.txt. The contents are simple, I just made it nelson:p for the insecure plaintext password “p” (you have to have some password). But that file has to be only readable by the user running rsync. Otherwise auth won’t work but the error message won’t tell you why.

  • telegraf ping in unprivileged LXC containers

    Quick hint for solution for pinging in native mode from telegraf in an LXC container:

    setcap cap_net_raw+ep /usr/bin/telegraf

    I use telegraf to run pings for monitoring networks. You can run it in a tiny Alpine LXC container in Proxmox. But there’s a permissions issue with unprivileged containers that results in status code 2 being logged to Influx and errors like these for native pings and exec pings:

    Aug 18 22:32:00 influxdb telegraf[865]: 2024-08-18T22:32:00Z E! [inputs.ping] ping failed: permission changes required, enable CAP_NET_RAW capabilities (refer to the ping plugin's README.md for more info)
    
    Aug 18 22:38:00 influxdb telegraf[896]: 2024-08-18T22:38:00Z E! [inputs.ping] Error in plugin: host "192.168.68.1": exit status 2 - /usr/bin/ping: socktype: SOCK_RAW
    Aug 18 22:38:00 influxdb telegraf[896]: /usr/bin/ping: socket: Operation not permitted
    Aug 18 22:38:00 influxdb telegraf[896]: /usr/bin/ping: => missing cap_net_raw+p capability or setuid?
    

    What great error messages, whoever wrote that deserves a peer bonus. It’s almost enough to solve the problem! Turns out LXC containers are locked down and by default can’t manipulate raw packets. But ping needs to be able to do that. The solution is to give the binary doing the pinging the raw network capability:

    setcap /usr/bin/telegraf cap_net_raw=ep

    You can verify it took with getcap /usr/bin/telegraf. If you need exec pings you can make it work with setcap /usr/bin/ping cap_net_raw=ep

    Here’s the weird thing. Root inside the container isn’t trusted. So why can root inside the container elevate the privileges of an executable in the container? Why does the hypervisor allow that?

    The other weird thing: /usr/bin/ping in the container does not have this capability set in my Debian installs. But ping works anyway, you can ping from the container in a shell as root. However if telegraf tries to ping using that same binary, it doesn’t work.

    I suspect the issue is telegraf is not running as root inside the container, it has its own user. Indeed, sudo -u telegraf ping 8.8.8.8 gives an error (unless you set the capability). I still don’t understand why root inside the container is trusted though. I must not understand what “untrusted ” really means in LXC.

  • Another USB failure (OWC enclosure)

    Ugh, right after writing a big blog post about how I’d made a reliable USB storage array, it failed. I was making a backup with Restic from the RAID array to another USB disk and both disks in the RAID failed. The simple kernel reset didn’t work either. So ZFS suspended the pool and everything broke. A reboot took 10 minutes but it’s all back and I don’t think I lost any data. But ugh, now I don’t trust this hardware.

    The problems start when a read command from UAS fails on drive sdc. 30 seconds later the kernel resets the device, at the same time there’s a read error from the other drive sdb. It’s at this point everything breaks. The reset doesn’t work and so the devices are both taken offline. That causes I/O errors that ZFS notices and it suspends the pool immediately.

    Just after the pool is suspended both disks come back online with new device names (sdd and sde). But ZFS doesn’t just magically assume these are OK. 122 seconds later the kernel notices some tasks have been blocked and complain.

    I don’t see any errors in journalctl in the VM guest that’s using this ZFS pool. Unfortunately the kernel logs didn’t persist after a reboot so I can’t check there. It looked like the backup process was still gamely trying to run, it wasn’t even in Device Wait, but I’m not sure.

    I think I might have been able to recover from the command line with zpool clear or the like. But I figured rebooting Proxmox would be simpler. That took 10 minutes after I pressed the button on the Web GUI. There’s about 6 minutes of shutdown in the logs as it’s timing out on various things that are unhappy their disk went away. Then there’s an unexplained 4 minute gap from the last log line written during shutdown to a new log line for the kernel coming up again. Afterwards the reboot went smoothly, stuff was working again in 15 seconds or so.

    Very frustrated. Particularly that in this case the kernel reset didn’t work: with previous USB errors things mostly recovered. It’s like the whole USB enclosure just failed. Which is actually quite a plausible explanation.

    Update Sep 7: three weeks later I haven’t seen another error from these disks. Mostly just light use but I’ve taken two full backups, one with rsync and one with restic (the thing that triggered the error before). I’m running restic with a single CPU thread and read concurrency 1, so it doesn’t hit the disk as hard.

    I don’t know what conclusion to draw from this. We’re going to keep using the disks in the USB setup for now. But if I ever set up another serious ZFS array I won’t use USB for it. Common wisdom it’s a bad idea and I’ve now had enough problems I respect it.

  • USB drives for Linux servers

    A summary of what I’ve learned using USB drives to make a little NAS server with Linux.

    The serious guides will tell you “don’t use USB for storage”. They’re probably right: USB adds an extra layer, the SATA/USB interface. And those chipsets seem pretty flaky, the last thing you want in a storage system. With most of the hardware I tested the Linux kernel would throw errors every 30 minutes or so and reset the device.

    OTOH it mostly seems to work and not corrupt data. I finally found an enclosure that hasn’t thrown an error (yet).

    Update: lol, not an hour after publishing this I got a failure from the enclosure I thought was good. A Restic backup (which is read intensive) triggered the failure. It didn’t even reset correctly, so the disks went offline and ZFS suspended the pool. ARGH!

    If you have a SATA or Thunderbolt interface by all means use that instead. Related: Host Bus Adapters. But if USB is all you got, here’s some notes.

    Drive types

    I’m focusing here on spinning 3.5″ drives, Western Digital Red Plus. These drives are the best inexpensive option for storage that’s cheap and big. Sequential throughput is pretty good on these, at least 100MBytes/s. SSDs are great but are 3-4x the price for the size. 2.5″ spinning drives are also nice but max out at 5TB. Also they use SMR media which has weird performance and reliability issues. 3.5″ drives still come in CMR.

    Enclosures

    The enclosure takes a 3.5″ SATA disk and gives you an interface: USB, also maybe eSATA or Thunderbolt. Small enclosures are simple and take a single disk. Bigger enclosures allow you to plug in many drives with a single USB cable. They often include a hardware RAID chipset you probably won’t use (JBOD is preferred for ZFS). If you put it in JBOD mode Linux will see multiple separate hard rives on a single USB interface.

    Enclosures provide power via a separate transformer (do not rely on bus power for 3.5″ drives.). I’ve heard that the quality of the power supply varies significantly and matters a lot: a power blip screws things up. I don’t know more about it. One nice thing about 2.5″ drives is they can be bus powered, simplifying things. If you don’t need big fast storage consider 2.5″.

    Many enclosures don’t provide power again if there’s a power outage. They have a stupid pushbutton you have to physically press to turn it back on. Look instead of enclosures with rocker switches that come on automatically.

    Enclosures also provide cooling. Cheap ones are just aluminum boxes with passive radiating. That seems OK for one disk but not many, I saw temperatures up to 55°C with two disks in a Yottamaster. The OWC enclosure I’m using has a small fan that makes a huge difference, max temperature like 36°C.

    The big problem is finding a reliable enclosure. I spent a lot of time stress testing enclosures and found the older USB 3.0 ones from Yottamaster and SABRENT had kernel errors. A newer 3.1 enclosure from OWC seems more reliable, it is my preference now.

    USB versions

    USB 3.0 provides 5Gbps or 625MByte/s, much faster than a spinning disk. So any 3.0 interface is fine for a single disk in terms of throughput.

    USB 3.1 and 3.2 can provide 10Gbps or 20Gbps. That shouldn’t matter for a single disk or really even two in an enclosure. OTOH I saw some evidence that two 5Gbps separate enclosures was twice as fast as a single 5Gbps enclosure with two disks, not sure what that means. I didn’t test with 10Gbps or 20Gbps links.

    However, there’s a theory that newer chipsets are more reliable, that some of the bugs in older systems have been worked out. So I’m going to avoid USB 3.0 enclosures and favor those that offer USB 3.2, or at least 3.1. Even better is eSATA, Thunderbolt, or USB4. If you have an interface on your computer for those you should use them instead of USB.

    Cables

    USB cables matter! It’s very easy to use a crummy cable and only get 480Mbits/s instead of 5000. The quick advice is “look for a blue connector”. Use lsusb -t to verify the speed you’re getting. Be particularly wary of C-to-A cables, a lot of them are really for charging and are slow for data.

    UAS vs usbstorage

    usbstorage is the old Linux kernel module for USB disks. My older 2.5″ drives are using it. It seems fine.

    uas is the new Linux kernel module for newer USB disks. UASP is basically SCSI and should perform better. Linux will use this module automatically if the drive supports it.

    usbfs turns up on Proxmox if you pass through a USB device to a guest. I have no idea what it does, nothing seems mounted with it.

    SMART

    USB devices are supposed to support SMART but it’s kind of tricky. I need to do more research on this.

    My old usbstorage drives work out of the box with SMART. With my newer UAS devices smartctl complains /dev/sdc: Unknown USB bridge with some enclosures. The quick hack is to run smartctl -d sat, which will probably work. There’s a more elaborate thing where you teach the SMART software about your drive type but I haven’t tried it.

    Performance

    Long story short how is USB performance? Pretty good, I think for sequential reads and writes I can get close to the spinning disk speeds. 100-150Mbytes/s, at least. Random access is slower of course and I saw some confusing results that depended on buffering. The final RAID mirror setup I have can do about 20MBytes/s read, 14MBytes/s write with an aggressively random access pattern. I don’t know whether USB is adding significant overhead or not.

  • FIO stress test of NAS options

    This blog post is a very long record of my testing of a disk array. I’m building a cheap NAS with 2 USB disks (8GB, 5400RPM) in a ZFS mirror configuration plugged in to an N100 MiniPC with 5Gbit USB links. I’m trying several enclosures: a Yottamaster dual bay, two SABRENT single bays, and an OWC dual bay. My main goal here is to find a reliable enclosure. (Spoiler: it’s the OWC.)

    I’m running everything on Proxmox. Most of my tests are with Proxmox managing a ZFS pool and creating a virtual disk it passes in to an OpenMediaVault VM (aka OMV). Then I CIFS/SMB mount the disk back on the Proxmox host for further testing. I also did a bit of testing with TrueNAS in a VM where Proxmox just passes the USB disks in as devices for TrueNAS to manage a pool on.

    Conclusions

    • Two of three USB enclosures generate kernel errors. Under heavy load there will be a command failure followed by a USB reset about every 30 minutes. In all cases the system recovers and it looks like no data is lost.
    • The OWC enclosure generates no kernel errors. It is also the lowest temperatures.
    • Throughput is wildly variable, sometimes transfers stall. I assume it’s buffering.
    • Network filesystems buffer even if you set various modes telling them not to.
    • TrueNAS seems to have better throughput than OMV. I didn’t test carefully.
    • TrueNAS is unusable with the Yottamaster enclosure, too many errors. It sort of worked with the SABRENT. Didn’t try the OWC.
    • The Yottamaster enclosure is unsuitable. Doesn’t turn itself on after a power outage, also not adequate cooling.
    • The SABRENT enclosures are OK. I don’t love having two power supplies. There were a few more errors in my testing but maybe that was just bad luck.

    Decisions

    • I’m going with the OWC enclosure.
    • I’m going to stick with Proxmox ZFS and OMV for the NAS. There’s evidence TrueNAS performs better but I’m nervous about its complexity. Also feeling burned after the bad first experience with the Yottamaster enclosure although that’s probably the USB device’s fault.

    The test job

    I’ve been enjoying learning fio, the disk throughput tool, mostly because it has very good docs. Which means not only can I read them but the AIs can too. Phind gets it wrong sometimes though so I’m mostly hand-crafting my own fio setups now.

    I cooked up a job file (see below) which I’m using as a small stress test as I try different USB hardware and ZFS / virtualization / network file system configurations. My main goal here is to run it for a long time to see if the disks fail in some way, also to measure temperature. I’m less worried about the actual throughput although it’s interesting.

    The job file defines a job against a 1GB file. Random reads and writes of blocks from 4k to 1M. Up to 16 in parallel (io_depth), direct I/O enabled. (Note direct=1 means O_DIRECT which doesn’t actually work over a network file system. Oops.) I’m running two jobs in parallel for 15 minutes.

    In general I’m seeing throughput of about 28MB/s both read and write, maybe 130 IOPS. That’s whether native on the ZFS machine, through VM disk, or through SMB to the VM disk. (Presumably seek time is dominating.) Not very fast but this is a random access test on a spinning disk. (Sequential gets up to 250 MB/s).

    Weirdly I’m seeing more like 100MB/s with an SMB network mount. I’ve chalked this up to me not understanding caching in network file systems. Details below.

    Here’s my results from several hours of testing. fio was run on Proxmox natively (which owns the ZFS pool), then on a OpenMediaVault VM (OMV) which gets a virtual disk from Proxmox, then from the Proxmox host itself that has SMB mounted the share from OMV. I also did a few tests with a different configuration: TrueNAS managing the two USB disks directly as USB devices and exporting over SMB.

    Measurements

    Tests with various enclosures, then some extra scenarios with the OWC enclosure.

    OWC results

    The hardware is a Mercury Elite Pro Dual, USB 3.1 at 5GBit, UAS. This device can do RAID bit I have it in JBOD mode, “IND”. PITA to get the disks installed, the mechanical design is not great, but it seem sturdy and has a little fan. It also advertises USB 3.1 (5Gbit) and there’s a theory that newer USB revisions imply newer, better chipsets.

    Smartctl requires -d sat to work.

    Max temperature: 34°C, the fan really helps. The disks are pretty noisy, the enclosure does nothing to dampen the sound.

    No errors from the kernel! Haven’t seen a single UAS error after several hours of testing, unlike the other enclosures.

    Throughput

    1. Proxmox ZFS native run 1: 21.0MiB/s read, 13.0 write, 97 IOPS.
    2. Proxmox ZFS native run 2: 21.4 read, 13.6 write.
    3. OMV: read 24.9 write 13.6, 109 IOPS
    4. SMB client of OMV run 1: read 122 MiB/s, write 66.5 MiB/s, 552 IOPS.
    5. SMB client of OMV run 2: read 133 MiB/s, write 72 MiB/s

    The ZFS native test seems pretty slow at 21/13, it’s the slowest of the three enclosures I tried. Although that’s not that much slower than the Yottamaster dual enclosure test at 28/18. hdparm -t (a simple sequential test of a single drive) shows 210MB/s, so it’s not totally broken.

    Yottamaster results

    Yottamaster PS200U3. One dual disk USB enclosure at 5Gbit USB 3.0, UAS.

    smartctl works out of the box on Proxmox, needed -d sat in TrueNAS.

    Max temperature: 52°C. Noisy too.

    Kernel errors: while testing over about 90 minutes I got one driver reset of a read command.

    Throughput

    • Proxmox ZFS native: read 28MiB/s, write 18MiB/s, 131 IOPS.
    • OMV run 1: read 27MiB/s, write 15MiB/s, 122 IOPS. (Had a reset!)
    • OMV run 2: read 28MiB/s, write 16MiB/s, 127 IOPS.
    • SMB client of OMV run 1: read 98MiB/s, write 53MiB/s, 443 IOPS.
    • SMB client of OMV run 2: read 104MiB/s, write 55.7MiB/s, 470 IOPS.

    SABRENT results

    Two single disk SABRENT EC-KSL3 USB enclosures at 5Gbit USB 3.0. UAS. I’ve been using a third one of these for a week+ with good luck.

    smartctl requires -d sat in Proxmox

    Temperatures: 41. Quietest of the three.

    More errors, but throughput looks good. While testing got an ugly UAS reset in the VM fio run. From the first error it took 54 seconds before a reset came through. OTOH it didn’t seem to affect throughput. Only one disk got reset, don’t know what ZFS does about that. A second reset during the SMB mount test, again not hurting throughput. Weird.

    I also got a new kind of error I’ve not seen before, maybe related to erasing blocks? This error doesn’t end up in the SMART log.

    [Mon Aug 12 01:28:06 2024] sd 1:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
    [Mon Aug 12 01:28:06 2024] sd 1:0:0:0: [sdd] tag#0 Sense Key : Illegal Request [current]
    [Mon Aug 12 01:28:06 2024] sd 1:0:0:0: [sdd] tag#0 Add. Sense: Invalid command operation code
    [Mon Aug 12 01:28:06 2024] sd 1:0:0:0: [sdd] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 02 fe 90 00 00 09 70 00 00
    [Mon Aug 12 01:28:06 2024] critical target error, dev sdd, sector 196240 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
    

    All in all 3 kernel errors compared to just 1 with the Yottamaster for the same amount of testing. Feels bad man. The fio verify didn’t find any problems though. I ran a ZFS scrub and it didn’t find any problems.

    Throughput

    • Proxmox ZFS native: read 58, write 37, 266 IOPS.
    • OMV run 1: read 60.3 write 33.5 , 270 IOPS. (Had a reset!)
    • OMV run 2: read 59.8 write 33.7 , 275 IOPS (No reset).
    • SMB client of OMV run 1: read 128 MiB/s, write 68.4 MiB/s, 578 IOPS.
    • SMB client of OMV run 2: read 145 MiB/s, write 75 MiB/s, 658 IOPS.
    • SMB client of OMV run 3: read 133 write 68. (Reset)

    Note the Proxmox and OMV throughput is better, maybe 2x what I got with the dual bay enclosures. Is that as simple as it being because I’m using two USB ports and two 5Gbit connections? (The max throughput I’ve seen is about 200MB/s or 1.6Gbit, well under a single 5Gb wire, so it’s not as simple as max bandwidth.) OTOH I’ve gotten more errors and more variety of errors. But all the data seems to be correctly being read and written.

    TrueNAS tests

    In addition to testing enclosures I did a couple of tests with TrueNAS managing the ZFS pool using USB devices passed in from Proxmox. This failed catastrophically with the Yottamaster enclosure, far too many errors. These tests are with the SABRENT. I didn’t try the OWC with TrueNAS.

    The main thing I was wondering was what sort of errors I’d see, if any. I saw only one error with three 15 minute runs on TrueNAS, about halfway through a VM test. Write(16) commands failed and it reset about 30 seconds later and seemed to carry on fine.

    Throughput

    1. TrueNAS VM run 1: 122MiB/s read, 64 write, 550 IOPS
    2. TrueNAS VM run 2: 137 read, 72 write
    3. SMB client of TrueNAS VM: 138MiB/s read, 76 write, 630 IOPS

    I’m less worried about throughput comparisons to OMV, I care most about stability. But compared to the Proxmox ZFS + OMV solution it seems both faster and with less jitter. That is not surprising but should be confirmed more.

    SMB and buffering tests

    I tried to understand more what was going on with the SMB clients being faster than local writes. It’s definitely related to O_DIRECT and other synchronous writes not being enabled on network filesystems. Most of my tests were with direct=1 in FIO but that doesn’t really mean anything on an SMB mount. Forcing sync writes slows the network tests way down.

    I ran some 1 minute tests of different configs on the VM writing to its virtual disk. The FIO test set was 1GB which might fit in caches or might not. I noticed the second time I ran the same test it was much faster, which suggests warming up the caches is a big factor in a 1 minute test.

    1. direct=1, end_fsync=1 (like most tests): 15MiB/s read, 16 MiB/s write.
    2. direct=0, buffered=1, end_fsync=1: 15MiB/s read, 15 MiB/s write.
    3. fio defaults (buffered and no fsync, I think): 13MiB/s read, 13MiB write.
    4. sync=1, fsync=1: 3.2MiB/s read, 3.7MiB/s write.

    I can’t make much sense of these local results: even with full buffering I can’t get to the 100MiB/s the disk is capable of. Maybe it’s a virtualization issue, or the client workload being on the disk. The VM has 4GB of RAM allocated which I think should be enough to cache a 1GB working set.

    I also re-ran these tests on the SMB host. I ran the first one twice and took the second result.

    1. direct=1, end_fsync=1 (like most tests): 102MiB/s read, 64 MiB/s write.
    2. direct=0, buffered=1, end_fsync=1: 110MiB/s read, 72.9 MiB/s write.
    3. fio defaults (buffered and no fsync, I think): 160 read, 118 write.
    4. sync=1, fsync=1: 3.0 read, 3.3 write.

    These results suggest that the network filesystems are caching even in most IO modes that say don’t. That’s fair, it’s a real world result for a network client. Forcing sync and fsync does slow it down as expected.

    To be honest, I don’t really understand all of what’s going on here, I don’t know FIO or network filesystems well enough to fully sort this out. Given my main goal was stability testing, eh, I’m OK with my ignorance for now.

    Application testing

    I did a bunch of real world testing, writing via SMB to the OMV VM with the OWS enclosure.

    1. Linux rsync of a music library: 21MB/s.
    2. Linux dd writing 16GB from /dev/urandom: 70MB/s
    3. dd reading 16GB of random: 182MB/s
    4. dd writing 16GB from /dev/zero: 188MB/s
    5. dd reading 16GB of zeros: 718MB/s
    6. WSL (drvfs) reading 16GB of random: 42MB/s
    7. WSL (drvfs) writing 1GB of random: 39MB/s
    8. Windows explorer reading or writing 1GB of random: 112MB/s

    The FIO job file

    [stress]
    name=stress
    description=Nelson small stress test
    
    runtime=900
    numjobs=2
    time_based=1
    
    rw=randrw
    verify=crc32c
    
    bsrange=4k-1M
    size=1G
    
    ioengine=libaio
    iodepth=16
    direct=1
    end_fsync=1
    

    Example output (from an OMV VM run)

    This run had a kernel reset in the first minute or two.

    stress: (g=0): rw=randrw, bs=(R) 4096B-1024KiB, (W) 4096B-1024KiB, (T) 4096B-1024KiB, ioengine=libaio, iodepth=16
    ...
    fio-3.33
    Starting 2 processes
    Jobs: 2 (f=2): [f(2)][100.0%][eta 00m:00s]
    stress: (groupid=0, jobs=1): err= 0: pid=128236: Sun Aug 11 21:42:17 2024
      Description  : [Nelson small stress test]
      read: IOPS=42, BW=13.9MiB/s (14.6MB/s)(12.3GiB/904825msec)
        slat (usec): min=2, max=8473, avg=26.58, stdev=52.04
        clat (usec): min=30, max=8023.4k, avg=158853.13, stdev=208421.15
         lat (usec): min=42, max=8023.4k, avg=158879.70, stdev=208425.03
        clat percentiles (usec):
         |  1.00th=[     66],  5.00th=[    137], 10.00th=[    245],
         | 20.00th=[    742], 30.00th=[   1565], 40.00th=[   3621],
         | 50.00th=[  79168], 60.00th=[ 143655], 70.00th=[ 206570],
         | 80.00th=[ 316670], 90.00th=[ 480248], 95.00th=[ 583009],
         | 99.00th=[ 742392], 99.50th=[ 801113], 99.90th=[ 926942],
         | 99.95th=[1027605], 99.99th=[3439330]
       bw (  KiB/s): min=   24, max=258184, per=47.42%, avg=13000.64, stdev=16851.88, samples=985
       iops        : min=    0, max=  564, avg=38.20, stdev=39.37, samples=985
      write: IOPS=23, BW=8021KiB/s (8213kB/s)(6549MiB/836125msec); 0 zone resets
        slat (usec): min=7, max=3420.8k, avg=343.27, stdev=24216.95
        clat (usec): min=6, max=30381k, avg=292247.93, stdev=1846279.09
         lat (usec): min=58, max=30381k, avg=292591.21, stdev=1846416.62
        clat percentiles (usec):
         |  1.00th=[      70],  5.00th=[     101], 10.00th=[     131],
         | 20.00th=[     192], 30.00th=[     293], 40.00th=[     979],
         | 50.00th=[    5669], 60.00th=[   13173], 70.00th=[   72877],
         | 80.00th=[  183501], 90.00th=[  367002], 95.00th=[  526386],
         | 99.00th=[ 8791262], 99.50th=[17112761], 99.90th=[17112761],
         | 99.95th=[17112761], 99.99th=[17112761]
       bw (  KiB/s): min=    8, max=299704, per=87.37%, avg=13650.88, stdev=19572.55, samples=977
       iops        : min=    0, max=  646, avg=40.49, stdev=45.01, samples=977
      lat (usec)   : 10=0.01%, 50=0.22%, 100=3.47%, 250=12.13%, 500=8.42%
      lat (usec)   : 750=2.49%, 1000=2.13%
      lat (msec)   : 2=7.58%, 4=6.57%, 10=4.21%, 20=4.34%, 50=3.45%
      lat (msec)   : 100=4.07%, 250=19.25%, 500=13.82%, 750=6.52%, 1000=0.65%
      lat (msec)   : 2000=0.08%, >=2000=0.61%
      cpu          : usr=0.51%, sys=0.23%, ctx=44359, majf=0, minf=60
      IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.3%, 16=99.4%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=38221,19954,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=16
    stress: (groupid=0, jobs=1): err= 0: pid=128237: Sun Aug 11 21:42:17 2024
      Description  : [Nelson small stress test]
      read: IOPS=36, BW=12.9MiB/s (13.5MB/s)(11.4GiB/904823msec)
        slat (usec): min=2, max=37758, avg=29.45, stdev=218.32
        clat (usec): min=3, max=5053.6k, avg=184087.63, stdev=201615.07
         lat (usec): min=41, max=5053.7k, avg=184117.07, stdev=201618.12
        clat percentiles (usec):
         |  1.00th=[     65],  5.00th=[    135], 10.00th=[    219],
         | 20.00th=[    490], 30.00th=[   1680], 40.00th=[  58983],
         | 50.00th=[ 137364], 60.00th=[ 187696], 70.00th=[ 256902],
         | 80.00th=[ 362808], 90.00th=[ 484443], 95.00th=[ 583009],
         | 99.00th=[ 725615], 99.50th=[ 775947], 99.90th=[ 926942],
         | 99.95th=[1002439], 99.99th=[1635779]
       bw (  KiB/s): min=   24, max=303480, per=48.60%, avg=13324.66, stdev=15923.38, samples=900
       iops        : min=    0, max=  636, avg=36.52, stdev=34.91, samples=900
      write: IOPS=21, BW=7829KiB/s (8017kB/s)(6208MiB/812003msec); 0 zone resets
        slat (usec): min=8, max=23051, avg=172.05, stdev=269.40
        clat (usec): min=4, max=26223k, avg=345496.71, stdev=1929292.62
         lat (usec): min=56, max=26223k, avg=345668.76, stdev=1929294.26
        clat percentiles (usec):
         |  1.00th=[      68],  5.00th=[     101], 10.00th=[     130],
         | 20.00th=[     192], 30.00th=[     269], 40.00th=[     400],
         | 50.00th=[    2409], 60.00th=[   11469], 70.00th=[   33162],
         | 80.00th=[  214959], 90.00th=[  438305], 95.00th=[  591397],
         | 99.00th=[ 9328133], 99.50th=[17112761], 99.90th=[17112761],
         | 99.95th=[17112761], 99.99th=[17112761]
       bw (  KiB/s): min=   16, max=360360, per=91.60%, avg=14312.77, stdev=19145.59, samples=884
       iops        : min=    0, max=  756, avg=39.49, stdev=41.87, samples=884
      lat (usec)   : 4=0.01%, 10=0.01%, 50=0.24%, 100=3.43%, 250=13.62%
      lat (usec)   : 500=11.35%, 750=3.48%, 1000=1.55%
      lat (msec)   : 2=3.97%, 4=2.64%, 10=2.95%, 20=4.05%, 50=3.18%
      lat (msec)   : 100=3.23%, 250=19.75%, 500=17.91%, 750=7.21%, 1000=0.56%
      lat (msec)   : 2000=0.06%, >=2000=0.80%
      cpu          : usr=0.62%, sys=0.19%, ctx=42436, majf=0, minf=58
      IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=99.3%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=33008,17566,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=16
    
    Run status group 0 (all jobs):
       READ: bw=26.8MiB/s (28.1MB/s), 12.9MiB/s-13.9MiB/s (13.5MB/s-14.6MB/s), io=23.7GiB (25.4GB), run=904823-904825msec
      WRITE: bw=15.3MiB/s (16.0MB/s), 7829KiB/s-8021KiB/s (8017kB/s-8213kB/s), io=12.5GiB (13.4GB), run=812003-836125msec
    
    Disk stats (read/write):
      sdb: ios=71241/37679, merge=0/79, ticks=12088873/12617701, in_queue=25410356, util=98.74%
    

    Example output (fast OMV SMB)

    stress: (g=0): rw=randrw, bs=(R) 4096B-1024KiB, (W) 4096B-1024KiB, (T) 4096B-1024KiB, ioengine=libaio, iodepth=16
    ...
    fio-3.33
    Starting 2 processes
    Jobs: 2 (f=2): [F(2)][100.0%][eta 00m:00s]
    stress: (groupid=0, jobs=1): err= 0: pid=2695937: Sun Aug 11 22:35:44 2024
      Description  : [Nelson small stress test]
      read: IOPS=145, BW=47.6MiB/s (49.9MB/s)(42.5GiB/913477msec)
        slat (usec): min=2, max=26353, avg=58.40, stdev=206.57
        clat (usec): min=104, max=2047.9k, avg=12816.66, stdev=61461.93
         lat (usec): min=117, max=2047.9k, avg=12875.05, stdev=61461.19
        clat percentiles (usec):
         |  1.00th=[    996],  5.00th=[   2147], 10.00th=[   2769],
         | 20.00th=[   3621], 30.00th=[   4424], 40.00th=[   5407],
         | 50.00th=[   6521], 60.00th=[   7701], 70.00th=[   9110],
         | 80.00th=[  10814], 90.00th=[  13698], 95.00th=[  17171],
         | 99.00th=[ 120062], 99.50th=[ 446694], 99.90th=[ 960496],
         | 99.95th=[1166017], 99.99th=[1484784]
       bw (  KiB/s): min=   33, max=571776, per=100.00%, avg=105476.19, stdev=139877.54, samples=369
       iops        : min=    0, max= 1773, avg=299.33, stdev=402.43, samples=369
      write: IOPS=78, BW=25.4MiB/s (26.7MB/s)(21.9GiB/882111msec); 0 zone resets
        slat (usec): min=8, max=19996, avg=422.84, stdev=490.09
        clat (usec): min=120, max=4478.9k, avg=20844.50, stdev=104407.14
         lat (usec): min=145, max=4479.4k, avg=21267.34, stdev=104398.63
        clat percentiles (usec):
         |  1.00th=[    570],  5.00th=[   2376], 10.00th=[   2966],
         | 20.00th=[   3884], 30.00th=[   4817], 40.00th=[   5997],
         | 50.00th=[   7177], 60.00th=[   8717], 70.00th=[  10552],
         | 80.00th=[  13435], 90.00th=[  22938], 95.00th=[  55837],
         | 99.00th=[ 270533], 99.50th=[ 505414], 99.90th=[1535116],
         | 99.95th=[1954546], 99.99th=[4076864]
       bw (  KiB/s): min=  116, max=588064, per=100.00%, avg=111529.76, stdev=145498.82, samples=361
       iops        : min=    0, max= 1835, avg=319.26, stdev=420.92, samples=361
      lat (usec)   : 250=0.15%, 500=0.39%, 750=0.33%, 1000=0.28%
      lat (msec)   : 2=2.62%, 4=19.68%, 10=49.18%, 20=21.19%, 50=3.53%
      lat (msec)   : 100=1.01%, 250=0.79%, 500=0.38%, 750=0.21%, 1000=0.13%
      lat (msec)   : 2000=0.13%, >=2000=0.02%
      cpu          : usr=1.78%, sys=1.96%, ctx=161455, majf=0, minf=61
      IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.3%, 16=99.4%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=132944,68929,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=16
    stress: (groupid=0, jobs=1): err= 0: pid=2695938: Sun Aug 11 22:35:44 2024
      Description  : [Nelson small stress test]
      read: IOPS=144, BW=51.0MiB/s (53.5MB/s)(45.5GiB/913651msec)
        slat (usec): min=2, max=23315, avg=57.89, stdev=177.90
        clat (usec): min=101, max=1961.1k, avg=12840.81, stdev=60523.78
         lat (usec): min=124, max=1961.2k, avg=12898.70, stdev=60523.08
        clat percentiles (usec):
         |  1.00th=[   1057],  5.00th=[   2212], 10.00th=[   2966],
         | 20.00th=[   3949], 30.00th=[   4752], 40.00th=[   5669],
         | 50.00th=[   6587], 60.00th=[   7767], 70.00th=[   8979],
         | 80.00th=[  10814], 90.00th=[  13566], 95.00th=[  16909],
         | 99.00th=[ 113771], 99.50th=[ 480248], 99.90th=[ 926942],
         | 99.95th=[1098908], 99.99th=[1434452]
       bw (  KiB/s): min=   11, max=535318, per=100.00%, avg=116860.38, stdev=139385.70, samples=352
       iops        : min=    0, max= 1606, avg=310.24, stdev=377.85, samples=352
      write: IOPS=77, BW=27.3MiB/s (28.6MB/s)(23.6GiB/884188msec); 0 zone resets
        slat (usec): min=6, max=26990, avg=438.46, stdev=504.53
        clat (usec): min=132, max=2040.9k, avg=17837.66, stdev=67499.86
         lat (usec): min=162, max=2041.4k, avg=18276.12, stdev=67487.74
        clat percentiles (usec):
         |  1.00th=[    668],  5.00th=[   2704], 10.00th=[   3556],
         | 20.00th=[   4686], 30.00th=[   5669], 40.00th=[   6783],
         | 50.00th=[   7963], 60.00th=[   9241], 70.00th=[  10945],
         | 80.00th=[  13566], 90.00th=[  21365], 95.00th=[  46924],
         | 99.00th=[ 204473], 99.50th=[ 387974], 99.90th=[1019216],
         | 99.95th=[1434452], 99.99th=[1769997]
       bw (  KiB/s): min=   82, max=585474, per=100.00%, avg=126777.70, stdev=149747.38, samples=347
       iops        : min=    0, max= 1630, avg=335.09, stdev=397.89, samples=347
      lat (usec)   : 250=0.12%, 500=0.35%, 750=0.31%, 1000=0.27%
      lat (msec)   : 2=2.34%, 4=14.81%, 10=53.90%, 20=22.06%, 50=3.49%
      lat (msec)   : 100=0.93%, 250=0.67%, 500=0.31%, 750=0.21%, 1000=0.15%
      lat (msec)   : 2000=0.09%, >=2000=0.01%
      cpu          : usr=2.50%, sys=2.06%, ctx=165105, majf=0, minf=58
      IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=99.3%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=132204,68715,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=16
    
    Run status group 0 (all jobs):
       READ: bw=98.6MiB/s (103MB/s), 47.6MiB/s-51.0MiB/s (49.9MB/s-53.5MB/s), io=88.0GiB (94.5GB), run=913477-913651msec
      WRITE: bw=52.7MiB/s (55.2MB/s), 25.4MiB/s-27.3MiB/s (26.7MB/s-28.6MB/s), io=45.5GiB (48.8GB), run=882111-884188msec
    
  • Accessing a Proxmox guest in ZFS from a rescue disk

    Testing disaster recovery: what if my Proxmox server dies and I need to get data from a VM disk stored in it in ZFS? The simple thing is to boot Proxmox and use it to recover. But what if I don’t want to use Proxmox to get at my files? It’s possible, at least in theory, using generic ZFS and QEMU tools. I tried this and it seemed to work.

    (An aside: I harshly unplugged my ZFS pool disks while Proxmox was running, then plugged them in later. Proxmox didn’t handle this very well, I ended up having to reboot the Proxmox server and then everything was back with no drama.)

    Part 1: import the ZFS pool

    The first step is to get the ZFS pool online in a rescue environment.

    1. Boot systemrescue-zfs on some computer.
    2. Plug the drives from your ZFS pool into the rescue machine
    3. Run lsblk to see that the drives are present.
    4. Run zpool import to see a list of pools. Your Proxmox pool should be there. Proxmox creates rpool on install, you may have others.
    5. Run zpool import -f poolname to import the pool.
    6. Run zfs list to see what the pool has in it

    At this point any normal filesystems in Proxmox are mounted somewhere in the filesystem and you can cd to their directories, copy them as files, etc.

    Part 2: access the QEMU virtual disks

    I’m after data that’s in a guest virtual machine’s disk. And those aren’t stored in normal files. Instead they are stored in ZFS datasets with funny names like rpool/data/vm-100-disk-1. They are in QEMU disk image format. (I think raw, not qcow2, but I am not certain.)

    There’s several options for what to do next. You could zfs send the data somewhere safe. In Proxmox you could access the virtual disks with the qm command line tool. There’s probably a way to use qemu-img dd to get your data. But here in systemrescue-zfs I’m going to mount it via a network block device.

    I’m a little out of my depth here but I bashed my way into getting access like this

    1. cd /zvol. In here is filesystem access to the ZFS datasets (as symlinks to block devices)
    2. Find the virtual disk you want, say vm-100-disk-1
    3. qemu-img info vm-100-disk-1 to see if there’s a plausible disk image there.
    4. modprobe nbd max_part=8 to load a driver for network block devices
    5. qemu-nbd --connect=/dev/nbd0 vm-100-disk-1 to make the QEMU disk available as /dev/nbd0. Note this is the whole disk, not just one partition.
    6. fdisk /dev/nbd0 -l to see the partition table of the virtual disk
    7. mount /dev/nbd0p1 /mnt/hope to mount the partition 1 filesystem as /mnt/hope.
    8. cd /mnt/hope and there’s your stuff!