Skip to content

fix: Only reload PKCS11 library if key is not found#3795

Merged
Bravo555 merged 2 commits intothin-edge:mainfrom
Bravo555:fix/pkcs11-reinit
Sep 23, 2025
Merged

fix: Only reload PKCS11 library if key is not found#3795
Bravo555 merged 2 commits intothin-edge:mainfrom
Bravo555:fix/pkcs11-reinit

Conversation

@Bravo555
Copy link
Copy Markdown
Member

@Bravo555 Bravo555 commented Sep 22, 2025

TODO

  • make the dummy request not reload the library

Optional follow-up

  • add a switch to opt-in/opt-out of the reload
  • more logging (?)

Proposed changes

Reload PKCS11 library only if we can't find any key and re-enable previously disabled test suite.

This PR re-enables the slot fix disabled by #3785, but defers the reloading of the library only when we don't find any key. This has the advantage that the first request won't suffer from the Nitrokey slowness and we'll be able to complete normally if the key is present and is configured properly.

Note that the first request that we make to ensure the library is loaded is a new Ping request and responded to by a Pong response, which is essentially a no-op but in systemd socket deployments it causes the server to start if it isn't already running. The library is loaded during server startup, with subsequent Ping requests NOT triggering the reload of the library.

The library is only reloaded when handling ChooseScheme and Sign requests, and only when the key cannot be found. Other errors like a wrong PIN, objects not having expected attributes, or other generic PKCS11 errors do not trigger a reload.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

#3766

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s. You can activate automatic signing by running just prepare-dev once)
  • I ran just format as mentioned in CODING_GUIDELINES
  • I used just check as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

@codecov
Copy link
Copy Markdown

codecov bot commented Sep 22, 2025

Codecov Report

❌ Patch coverage is 22.58065% with 24 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ates/extensions/tedge-p11-server/src/pkcs11/mod.rs 0.00% 21 Missing ⚠️
...es/extensions/tedge-p11-server/src/proxy/client.rs 75.00% 1 Missing and 1 partial ⚠️
crates/extensions/tedge-p11-server/src/signer.rs 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Sep 22, 2025

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
695 0 3 695 100 2h8m7.115186999s

Copy link
Copy Markdown
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A critical and no so obvious piece of code is duplicated. Otherwise, the code looks good.

Comment on lines +123 to +128
// This branch runs only when a client calls TedgeP11Client::with_ready_check.
// It essentially does nothing, but when the server is not yet running and is triggered
// by a systemd socket, at a point client makes the request systemd starts the server,
// Cryptoki object is initialized before TedgeP11Server is constructed (in main) and at
// the point we return the response here, the PKCS11 library is loaded and ready to
// serve the subsequent request.
Copy link
Copy Markdown
Contributor

@didier-wenzek didier-wenzek Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this comment a bit difficult to understand. The content is here (reading twice, I got it), but having several independent sentences might help.

Here is my proposal:

The Ping/Pong request does nothing per-se, but can be used by a client to start the service. Indeed, systemd being configured to start the service when a request is received on the associated socket, a Ping/Pong request triggers a service start and ensures the PKCS11 library is loaded and ready to serve signing requests. In practice, this only occurs with a client calls TedgeP11Client::with_ready_check.

Copy link
Copy Markdown
Member Author

@Bravo555 Bravo555 Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the comment describes what happens, but not why it happens, assuming the context of a problem that a reader will have no notion of. Used your comment and changed the beginning a bit to add a bit more context.
Fixed in c4bf1ad.

}

Err(e) => return Err(e),
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece of code is critical and tricky, and must therefore not be duplicated to choose_scheme and sign.

=> Introduce a get_signing_key(self, uri) method.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in bc09733

Comment on lines +108 to +109
// dropping the signing key should drop the session
drop(signing_key);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term should makes me wonder. Is the session actually dropped?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the comment sounds unsure. However, I'm sure the session must be dropped here. The signing_key is a Pkcs11Session which has session: Arc<Mutex<Session>, but after construction signing_key is immediately dropped without us cloning the session anywhere, so it must be dropped in entirety.

I have some in-progress refactoring that removes this Mutex, will be more clearer then. For now, giving this comment a little self-confidence in 931ae6b

@didier-wenzek didier-wenzek dismissed their stale review September 23, 2025 13:19

The requested change has been addressed

@Bravo555 Bravo555 temporarily deployed to Test Pull Request September 23, 2025 13:26 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

Signed-off-by: Marcel Guzik <marcel.guzik@cumulocity.com>
TedgeP11Client::with_ready_check used to make a ChooseScheme request
that it immediately discarded only to make sure the library is loaded.
That made unnecessary operation and generated output that was also
unnecessary and made the log harder to read.

Now with_ready_check uses a separate Ping request to which the server
responds by only ensuring the library is loaded and doing nothing else.

Signed-off-by: Marcel Guzik <marcel.guzik@cumulocity.com>
@Bravo555 Bravo555 temporarily deployed to Test Pull Request September 23, 2025 14:26 — with GitHub Actions Inactive
@Bravo555 Bravo555 added this pull request to the merge queue Sep 23, 2025
@didier-wenzek didier-wenzek added the theme:hsm Hardware Security Module related topics label Sep 23, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 23, 2025
@Bravo555 Bravo555 added this pull request to the merge queue Sep 23, 2025
Merged via the queue into thin-edge:main with commit 227ba6d Sep 23, 2025
34 checks passed
@Bravo555 Bravo555 deleted the fix/pkcs11-reinit branch September 24, 2025 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

theme:hsm Hardware Security Module related topics

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants