-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Fixes & improvments for using homed-luks on 4k disks #35776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| return log_error_errno(errno, "Failed to wipe partition table: %m"); | ||
| if ((size_t) n != ssz * 2) | ||
| return log_error_errno(SYNTHETIC_ERRNO(EIO), "Short write while wiping partition table."); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove this? i added this explicitly because libfdisk got confused by the old partition table, and its pmbr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not face an issue with writing out the new partition table. Also I was not able to understand the exact issue that the pmbr is causing from the earlier commit message. I'm not sure why fdisk wouldn't like the existing partition table.
Could you explain the issue in detail, and I will try reproducing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that it should be fine to drop this, if current libfdisk works without it. Maybe it was a bug in an old version, maybe it was a misunderstanding. If the issue pops up again, we'll revert the removal.
|
hmm, i am tempted to say we should use 4k sectors everywhere by default, unless specified in the user record otherwiwse. (and of course the gpt partition table should use the native sector size if we are talking about raw block devices). afaik there's now downside of using larger sector sizes on the luks and fs level, but it makes our images more portable... |
Hi, thanks for the review.
Also my thoughts were that these functions can be rewritten to use an internal systemd-repart library in the future. I think all the partition, luks & fs creation / resizing operations can be handled by the repart library. It will require creation of an repart api and add an additional dependency for homed |
|
Hi @scarlet-storm |
|
I will work on cleaning up some of the code style issues mentioned in this PR. |
|
Hi @scarlet-storm, |
Hello, gentle ping :) any news about it ? thank you |
26e7ac0 to
c19c851
Compare
c19c851 to
b4352e3
Compare
b4352e3 to
eb0380d
Compare
|
I have addressed the code style issues. For the open comments
|
keszybz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those changes look generally reasonable, but I don't think I grok all the implications. Some small comments on style.
| /* Round down to the nearest 1 MiB size. Given that most tools generally align partitions to 1 MiB boundaries, let's align our | ||
| * partitions to that too. In the worst case we'll waste 1 MiB per partition that way, but I think I can live |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to do this? The commit describes what is being done, but not really why. Doing something the same as others is not enough motivation, unless the difference is causing problems.
After those changes, the comment doesn't quite make sense. The sentence about wasting 4k bytes was partially a joke. If we're wasting 1MB, a better explanation is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For standardization. the code does not explicitly set the alignment when using fdisk as 4k, so by default fdisk is already using 1MiB alignment for initial partition creation. So the comment says 4k, but the created partition is already aligned to 1MiB boundary. Moreover during resize, fdisk_partition_size_explicit is called with enable=1 . This means that fdisk no longer will align the partitions. So we have a mix of partitions where the initial partition is aligned to 1MiB & if any resize is applied, the partition is aligned to 4KiB.
- As for the reason why 1MiB is used a standard partition size by all linux tools (fdisk, gparted, gptfdisk), it's probably because it is a multiple of most common sector sizes. Note that even for 4K disk, the sector size of 4k is applicable for read/write operations. Other operations like discard might have much higher granularity like 8k / 16k. Also for nand cells probably write amplification also plays a part.
- wasting 1 MiB is still a joke imo.
eb0380d to
323f040
Compare
Thank you for the review. If there is anything particular you want to discuss in detail, let me know. |
6de4f5d to
329ec52
Compare
|
Sorry for late review, but now this needs to be rebased. |
Ensure we use the same sector size as used in the created fdisk context when converting between sectors and bytes.
Remove zeroing of PMBR before writing out the new partition table. There is no reason, to do this explicitly ?
Auto-probe the luks sector size, if not explicitly specified in the home record
Align partitions to 1MiB for consistency with regular partition tools which use 1MiB alignment by default
If there is an explicit sector size specified in the user record, use the same when probing the file using libblkid. The default is 512 bytes, which will not be able to find the signatures, if the partition table on regular file was created assuming 4k sectors
Ensure we don't create a loop device on top of a physical block device. This leads to huge performance degradation of discard operations if the physical device does not support discard_on_zeroes. - loop device historical semantics dictates that when the device is discarded, it needs to return zero data on read. This can be implemented easily on a filesystem. since fallocate zero-range would return immediately & the holes are handled at the filesystem level to return zero data on read. - For a raw block device, the feature (discard_zeroes_data) depends on the capabilities of the physical device that is exposed to the block layer by the driver. This means that to guarantee that the loop device stacked on a block device returns zero on discarded data, it needs to convert discarded range into write_zero op on the block device. https://github.com/torvalds/linux/blob/63676eefb7a026d04b51dcb7aaf54f358517a2ec/drivers/block/loop.c#L773 For example on one of my local nvme I can see the following: cat /sys/class/block/nvme1n1/queue/write_zeroes_max_bytes 131072 cat /sys/class/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 This means maximum size of a write_zero operation can be 128KiB & maximum size of discard operation can be 2TiB on the block device. So discarding for example 1TB of data, which would be a single block device operation, gets split into 8.3 million block device operations when issued on top of stacked loop device.
Fix for specific case systemd#30393 where 4k sector luks container is created on a 512b device. In this case the partition table needs to be 512b, else the kernel will not be able to find the partition, and we will have to create a loop device to translate the partition table to 4k.
329ec52 to
e239c23
Compare
Mostly consists of fixes to
homectl create --image-path=does not work on 4Kn drives #30394 , Fixes Cannot login with user created using--luks-sector-size=4096#30393--luks-sector-size=4096#30393Also contains some other improvements with using physical block devices.