Skip to content

NAS-139874 / 26.0.0-BETA.1 / Convert zfs.dataset encryption methods to use truenas_pylibzfs#18291

Merged
creatorcary merged 32 commits intomasterfrom
NAS-139874
Mar 3, 2026
Merged

NAS-139874 / 26.0.0-BETA.1 / Convert zfs.dataset encryption methods to use truenas_pylibzfs#18291
creatorcary merged 32 commits intomasterfrom
NAS-139874

Conversation

@creatorcary
Copy link
Copy Markdown
Contributor

@creatorcary creatorcary commented Feb 24, 2026

Replace usage of the deprecated py-libzfs with truenas_pylibzfs for these private methods. This removes another use case of our process pool.

Depends on changes made in truenas/truenas_pylibzfs#145.

@bugclerk bugclerk changed the title Convert zfs.dataset encryption methods to use truenas_pylibzfs NAS-139874 / 26.0.0-BETA.1 / Convert zfs.dataset encryption methods to use truenas_pylibzfs Feb 24, 2026
@bugclerk
Copy link
Copy Markdown
Contributor

@creatorcary creatorcary marked this pull request as ready for review February 27, 2026 16:47
@creatorcary creatorcary requested a review from yocalebo February 27, 2026 16:47
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No bugs found, but this PR touches encryption-sensitive code paths across 8 files and depends on unreleased truenas_pylibzfs changes (PR #145). Worth a human review given the scope and security sensitivity.

Extended reasoning...

Overview

This PR replaces the deprecated py-libzfs with truenas_pylibzfs for ZFS dataset encryption methods (load_key, check_key, change_key, change_encryption_root, bulk_check). The old zfs_/dataset_encryption.py (which ran in a process pool) is deleted, and new standalone functions are added in zfs/dataset_encryption.py. Service wrappers are added to resource_crud.py using @pass_thread_local_storage to provide thread-safe libzfs handles. Callers across KMIP, pool dataset encryption info/lock/operations are updated accordingly. The pbkdf2iters default and minimum are raised from 350,000 to 1,300,000 with a from_previous migration hook.

Security risks

This PR directly handles encryption key loading, verification, and rotation for ZFS datasets. While the code patterns look correct (proper error handling, thread-local libzfs handles, no key material logged), the security-critical nature of these operations means any subtle behavioral difference between the old py-libzfs and new truenas_pylibzfs implementations could have serious consequences (failed unlocks, key verification bypass, etc.).

Level of scrutiny

High scrutiny warranted. This is production-critical encryption infrastructure code. The changes span 8 files and replace the underlying ZFS library for all encryption operations. The from_previous migration for pbkdf2iters is a breaking API change that requires careful review.

Other factors

  • The PR description states it depends on unreleased truenas_pylibzfs changes (PR #145) and the author is "waiting for next nightly to test", suggesting the PR is not yet fully validated.
  • A reviewer (yocalebo) previously raised an architectural concern about libzfs handle management that appears to have been addressed by using tls.lzh instead of opening new handles, and the comment is resolved.
  • The removal of the generic bulk_process job (which dispatched by method name) in favor of the specific bulk_check function is a positive safety improvement.
  • No test changes are included, which is expected since integration tests would cover this, but it means correctness depends on the nightly test run the author mentioned.

Copy link
Copy Markdown
Contributor

@yocalebo yocalebo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be cleaned up and polished.

@creatorcary creatorcary requested a review from yocalebo March 3, 2026 16:24
@creatorcary creatorcary requested a review from yocalebo March 3, 2026 17:02
@@ -91,16 +95,18 @@ def push_zfs_keys(self, ids=None):
return failed
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟣 Pre-existing bug: self.zfs_keys = {k: v for k, v in self.zfs_keys.items() if k in existing_datasets} on lines 94 and 125 always produces an empty dict because existing_datasets is a list[dict] and k is a str. In Python, str in list[dict] checks element-wise equality (str == dict), which is always False, so the entire KMIP key cache is wiped after every push/pull call. Fix by building a set of names first: existing_names = {ds["name"] for ds in existing_datasets} and filtering with if k in existing_names.

Extended reasoning...

What the bug is

get_encrypted_datasets() (line 34-52) returns a list[dict] — it initializes rv = list() and appends datastore record dicts via rv.append(ds_in_db[i["name"]]). On lines 94 and 125, self.zfs_keys is filtered with:

self.zfs_keys = {k: v for k, v in self.zfs_keys.items() if k in existing_datasets}

Here k is a string (dataset name like "pool/ds1") and existing_datasets is a list[dict]. The in operator checks element-wise equality, and since str == dict is always False in Python, every key is filtered out.

Step-by-step proof

  1. push_zfs_keys or pull_zfs_keys is called.
  2. existing_datasets = self.get_encrypted_datasets(filters) returns e.g. [{"name": "pool/ds1", "id": 1, ...}].
  3. During the loop, keys are added to self.zfs_keys, e.g. self.zfs_keys["pool/ds1"] = "<hex_key>".
  4. After the loop, the comprehension runs: k = "pool/ds1", "pool/ds1" in [{"name": "pool/ds1", ...}].
  5. Python evaluates: "pool/ds1" == {"name": "pool/ds1", ...}False (string never equals dict).
  6. Result: self.zfs_keys = {} — all cached keys are lost.

Quick verification:

>>> existing_datasets = [{"name": "pool/ds1", "id": 1}]
>>> "pool/ds1" in existing_datasets
False

Why existing code does not prevent it

Python does not warn when comparing strings to dicts via in. The operation silently returns False for every element, producing an empty dict without any error or exception.

Impact

self.zfs_keys is the in-memory cache of ZFS encryption keys managed by KMIP. After every push or pull:

  • All cached keys are lost, including keys for datasets not involved in the current operation.
  • kmip.retrieve_zfs_keys returns {}, so query_encrypted_roots_keys cannot find KMIP-managed keys.
  • This affects encryption_summary, sync_db_keys, export_keys, and unlock operations for KMIP-managed datasets.
  • zfs_keys_pending_sync always thinks keys need syncing because ds["name"] not in self.zfs_keys is always True after the cache is wiped.

How to fix

Build a set of dataset names before filtering:

existing_names = {ds["name"] for ds in existing_datasets}
self.zfs_keys = {k: v for k, v in self.zfs_keys.items() if k in existing_names}

This needs to be applied on both line 94 (in push_zfs_keys) and line 125 (in pull_zfs_keys).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs separate PR

@creatorcary creatorcary merged commit 0a12235 into master Mar 3, 2026
3 checks passed
@creatorcary creatorcary deleted the NAS-139874 branch March 3, 2026 19:51
@bugclerk
Copy link
Copy Markdown
Contributor

bugclerk commented Mar 3, 2026

This PR has been merged and conversations have been locked.
If you would like to discuss more about this issue please use our forums or raise a Jira ticket.

@truenas truenas locked as resolved and limited conversation to collaborators Mar 3, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants