NAS-139874 / 26.0.0-BETA.1 / Convert zfs.dataset encryption methods to use truenas_pylibzfs#18291
NAS-139874 / 26.0.0-BETA.1 / Convert zfs.dataset encryption methods to use truenas_pylibzfs#18291creatorcary merged 32 commits intomasterfrom
zfs.dataset encryption methods to use truenas_pylibzfs#18291Conversation
zfs.dataset encryption methods to use truenas_pylibzfszfs.dataset encryption methods to use truenas_pylibzfs
…AS-139874-test
…AS-139874-test
cc75cf9 to
f20e1d2
Compare
There was a problem hiding this comment.
No bugs found, but this PR touches encryption-sensitive code paths across 8 files and depends on unreleased truenas_pylibzfs changes (PR #145). Worth a human review given the scope and security sensitivity.
Extended reasoning...
Overview
This PR replaces the deprecated py-libzfs with truenas_pylibzfs for ZFS dataset encryption methods (load_key, check_key, change_key, change_encryption_root, bulk_check). The old zfs_/dataset_encryption.py (which ran in a process pool) is deleted, and new standalone functions are added in zfs/dataset_encryption.py. Service wrappers are added to resource_crud.py using @pass_thread_local_storage to provide thread-safe libzfs handles. Callers across KMIP, pool dataset encryption info/lock/operations are updated accordingly. The pbkdf2iters default and minimum are raised from 350,000 to 1,300,000 with a from_previous migration hook.
Security risks
This PR directly handles encryption key loading, verification, and rotation for ZFS datasets. While the code patterns look correct (proper error handling, thread-local libzfs handles, no key material logged), the security-critical nature of these operations means any subtle behavioral difference between the old py-libzfs and new truenas_pylibzfs implementations could have serious consequences (failed unlocks, key verification bypass, etc.).
Level of scrutiny
High scrutiny warranted. This is production-critical encryption infrastructure code. The changes span 8 files and replace the underlying ZFS library for all encryption operations. The from_previous migration for pbkdf2iters is a breaking API change that requires careful review.
Other factors
- The PR description states it depends on unreleased truenas_pylibzfs changes (PR #145) and the author is "waiting for next nightly to test", suggesting the PR is not yet fully validated.
- A reviewer (yocalebo) previously raised an architectural concern about libzfs handle management that appears to have been addressed by using
tls.lzhinstead of opening new handles, and the comment is resolved. - The removal of the generic
bulk_processjob (which dispatched by method name) in favor of the specificbulk_checkfunction is a positive safety improvement. - No test changes are included, which is expected since integration tests would cover this, but it means correctness depends on the nightly test run the author mentioned.
yocalebo
left a comment
There was a problem hiding this comment.
This needs to be cleaned up and polished.
src/middlewared/middlewared/plugins/pool_/dataset_encryption_info.py
Outdated
Show resolved
Hide resolved
src/middlewared/middlewared/plugins/pool_/dataset_encryption_lock.py
Outdated
Show resolved
Hide resolved
src/middlewared/middlewared/plugins/pool_/dataset_encryption_operations.py
Outdated
Show resolved
Hide resolved
src/middlewared/middlewared/plugins/pool_/dataset_encryption_operations.py
Outdated
Show resolved
Hide resolved
| @@ -91,16 +95,18 @@ def push_zfs_keys(self, ids=None): | |||
| return failed | |||
There was a problem hiding this comment.
🟣 Pre-existing bug: self.zfs_keys = {k: v for k, v in self.zfs_keys.items() if k in existing_datasets} on lines 94 and 125 always produces an empty dict because existing_datasets is a list[dict] and k is a str. In Python, str in list[dict] checks element-wise equality (str == dict), which is always False, so the entire KMIP key cache is wiped after every push/pull call. Fix by building a set of names first: existing_names = {ds["name"] for ds in existing_datasets} and filtering with if k in existing_names.
Extended reasoning...
What the bug is
get_encrypted_datasets() (line 34-52) returns a list[dict] — it initializes rv = list() and appends datastore record dicts via rv.append(ds_in_db[i["name"]]). On lines 94 and 125, self.zfs_keys is filtered with:
self.zfs_keys = {k: v for k, v in self.zfs_keys.items() if k in existing_datasets}Here k is a string (dataset name like "pool/ds1") and existing_datasets is a list[dict]. The in operator checks element-wise equality, and since str == dict is always False in Python, every key is filtered out.
Step-by-step proof
push_zfs_keysorpull_zfs_keysis called.existing_datasets = self.get_encrypted_datasets(filters)returns e.g.[{"name": "pool/ds1", "id": 1, ...}].- During the loop, keys are added to
self.zfs_keys, e.g.self.zfs_keys["pool/ds1"] = "<hex_key>". - After the loop, the comprehension runs:
k = "pool/ds1","pool/ds1" in [{"name": "pool/ds1", ...}]. - Python evaluates:
"pool/ds1" == {"name": "pool/ds1", ...}→False(string never equals dict). - Result:
self.zfs_keys = {}— all cached keys are lost.
Quick verification:
>>> existing_datasets = [{"name": "pool/ds1", "id": 1}]
>>> "pool/ds1" in existing_datasets
FalseWhy existing code does not prevent it
Python does not warn when comparing strings to dicts via in. The operation silently returns False for every element, producing an empty dict without any error or exception.
Impact
self.zfs_keys is the in-memory cache of ZFS encryption keys managed by KMIP. After every push or pull:
- All cached keys are lost, including keys for datasets not involved in the current operation.
kmip.retrieve_zfs_keysreturns{}, soquery_encrypted_roots_keyscannot find KMIP-managed keys.- This affects
encryption_summary,sync_db_keys,export_keys, and unlock operations for KMIP-managed datasets. zfs_keys_pending_syncalways thinks keys need syncing becauseds["name"] not in self.zfs_keysis alwaysTrueafter the cache is wiped.
How to fix
Build a set of dataset names before filtering:
existing_names = {ds["name"] for ds in existing_datasets}
self.zfs_keys = {k: v for k, v in self.zfs_keys.items() if k in existing_names}This needs to be applied on both line 94 (in push_zfs_keys) and line 125 (in pull_zfs_keys).
There was a problem hiding this comment.
Needs separate PR
|
This PR has been merged and conversations have been locked. |
Replace usage of the deprecated py-libzfs with truenas_pylibzfs for these private methods. This removes another use case of our process pool.
Depends on changes made in truenas/truenas_pylibzfs#145.