Improve performance of encrypted packets with a shared secret cache by jasonbcox · Pull Request #6560 · meshtastic/firmware

jasonbcox · 2025-04-10T17:41:52Z

Add a shared secret cache to dramatically improve performance and battery usage of encrypted communications after the first packet is sent between two nodes.

Using this additional patch jasonbcox@190444f, I logged the following on a RAK4631 when sending DMs between it and another node:
On master:

PKC encryption/decryption: ~96ms
Packet routing: ~107ms
All other modules, total: <5ms
Total packet handling time: 210 - 220ms

With this patch applied, first encrypted packet is same as above. All subsequent packets between the same two nodes perform as follows (times are roughly the same regardless of message length):

PKC encryption/decryption: ~3ms
Other modules (routing, etc): same as above
Total packet handling time: ~120ms

This is a 32x performance improvement of encrypted packet decoding at the cost of (up to) ~2.4kb of RAM.

For users that send DMs often, and assuming that every ms of compute is energy cost equivalent (its not, but we can estimate), this could provide around 45% improvement in battery usage with respect to encrypted communications.

Unit test coverage recently improved to ensure this patch does not regress, in addition to tests added in this PR (see #6485)

Fixes #6426

🤝 Attestations

I have tested that my proposed changes behave as described.
I have tested that my proposed changes do not cause any obvious regressions on the following devices:
- Heltec (Lora32) V3
- LilyGo T-Deck
- LilyGo T-Beam
- RAK WisBlock 4631
- Seeed Studio T-1000E tracker card
- Other (please specify below)

jasonbcox · 2025-04-10T17:46:07Z

I will continue to run this patch on both of my devices throughout the review period. If anyone could test on LilyGo or Heltec devices, that would be appreciated.

Copilot

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

test/test_crypto/test_main.cpp:140

[nitpick] Implementing a mechanism to mock millis() in the tests would allow simulation of delayed cache expiration and better verification of the cache update logic.

// FIXME If tests could mock the millis() time, it would be ideal to mock the time forward by a

jp-bennett · 2025-04-10T23:28:31Z

I definitely approve in principle. Since it's touching the crypto code, it needs more detailed review than I can give tonight.

GUVWAF · 2025-04-11T10:37:53Z

I really appreciate the work you put into this, but I think we should also be realistic about the actual gains of this versus the drawback. 2.4kBytes of RAM is not insignificant, that’s 12 nodes in the NodeDB and ~1% of the total RAM available for nRF52.

Regarding power consumption: I think if a node receives 100 messages directed towards it per day that would be a lot already. In this case we thus save 9.3 seconds of compute per day, which is only 0.01% of the time. However, for nodes connected to a client (it’s those nodes that you’re sending a DM to) we’re not going into a real sleep mode, so we’ll just be longer in mainDelay.delay(). How much this saves on energy depends on how the idle state is implemented on each platform, but e.g. the Arduino core for ESP32 doesn’t use power management (CONFIG_PM_ENABLE is not set).

Even if the power consumption of the CPU could be completely eliminated during 0.01% of the time, it should be considered relative to the LoRa radio that is always in receiving mode consuming about ~4.2mA for SX126x. If we would be able to mitigate only one packet transmission per day, which takes ~1 second on LongFast during which it consumes ~125mA more, I’d argue that’s already more than can be gained from this.

thebentern · 2025-04-11T12:46:14Z

I really appreciate the work you put into this, but I think we should also be realistic about the actual gains of this versus the drawback. 2.4kBytes of RAM is not insignificant, that’s 12 nodes in the NodeDB and ~1% of the total RAM available for nRF52.

Regarding power consumption: I think if a node receives 100 messages directed towards it per day that would be a lot already. In this case we thus save 9.3 seconds of compute per day, which is only 0.01% of the time. However, for nodes connected to a client (it’s those nodes that you’re sending a DM to) we’re not going into a real sleep mode, so we’ll just be longer in mainDelay.delay(). How much this saves on energy depends on how the idle state is implemented on each platform, but e.g. the Arduino core for ESP32 doesn’t use power management (CONFIG_PM_ENABLE is not set).

Even if the power consumption of the CPU could be completely eliminated during 0.01% of the time, it should be considered relative to the LoRa radio that is always in receiving mode consuming about ~4.2mA for SX126x. If we would be able to mitigate only one packet transmission per day, which takes ~1 second on LongFast during which it consumes ~125mA more, I’d argue that’s already more than can be gained from this.

Does it make sense to utilize this only on non-NRF platforms? I was actually thinking the main benefit of this would be significantly increased throughput for network connected nodes instead of improved power consumption, which as you pointed out is insignificant

jp-bennett · 2025-04-11T13:58:35Z

Does it make sense to utilize this only on non-NRF platforms? I was actually thinking the main benefit of this would be significantly increased throughput for network connected nodes instead of improved power consumption, which as you pointed out is insignificant

We can set the size much smaller on the NRF platform. 64 slots is quite a few. I'd set it to 8 on NRF.

Not entirely related, I do wonder if using a dynamic object like the unordered_map will lead to memory fragmentation.

jasonbcox · 2025-04-11T21:45:51Z

It wasn't clear to me how to best profile the RAM usage. Considering it's much tighter than I thought for NRF, it makes sense to both limit to 8 slots for NRF and change the implementation to use a vector so the memory is contiguous. It's not expensive to do O(n) lookups for such a small number of elements.

And considering @GUVWAF's point, I could reduce the default number of slots to 32 or even 24 and I think the cache would perform nearly as well for the vast majority of users.

GUVWAF · 2025-04-12T07:31:10Z

I was actually thinking the main benefit of this would be significantly increased throughput for network connected nodes instead of improved power consumption

Still I think before the decryption process becomes the bottleneck in terms of throughput you'll need to receive a lot of direct messages after each other via MQTT or UDP.

I could reduce the default number of slots to 32 or even 24 and I think the cache would perform nearly as well for the vast majority of users.

Maybe make it dependent on MAX_NUM_NODES?

jasonbcox · 2025-05-13T17:58:07Z

Full disclosure: I haven't tested the latest changes on devices yet, but looking for more feedback on the concept.

I decreased the cache size substantially, especially for boards that have very small memory. The default of 10 puts the cache usage under 400 bytes. On NRF with 8 slots, about 300 bytes.

I could not make it dependent on MAX_NUM_NODES directly since it is not a constant expression at compile time.

Copilot

Pull Request Overview

This PR adds a shared-secret cache to the CryptoEngine to speed up repeated encrypted packet handling by avoiding costly DH operations after the first exchange.

Introduces a fixed-size sharedSecretCache and the setCryptoSharedSecret method to manage lookup, insertion, and eviction of secrets.
Updates encryption/decryption paths to use the cache.
Adds unit tests to validate cache hits and basic insertion behavior.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
test/test_crypto/test_main.cpp	Added `TestCryptoEngine` helper and tests for cache insertion and hits.
src/mesh/CryptoEngine.h	Defined `CachedSharedSecret`, `MAX_CACHED_SHARED_SECRETS`, declared `setCryptoSharedSecret`, and added an unused `<unordered_map>` include.
src/mesh/CryptoEngine.cpp	Wired encryption/decryption to `setCryptoSharedSecret` and implemented the cache lookup/insert logic (with an unintended assignment bug).

Comments suppressed due to low confidence (1)

test/test_crypto/test_main.cpp:140

The current tests verify insertion and hit behavior but do not cover eviction when the cache exceeds MAX_CACHED_SHARED_SECRETS. Consider adding a test that fills the cache and ensures the oldest entry is evicted as expected.

// Calling again should fetch from the cache. Shared secret is the same.

thebentern · 2025-05-13T18:24:13Z

I could not make it dependent on MAX_NUM_NODES directly since it is not a constant expression at compile time.

It's not a compile time constant on all platforms but it is static. I wonder if we can static init the length struct to MAX_NUM_NODES / 10 ?

jasonbcox · 2025-05-13T18:27:17Z

I could not make it dependent on MAX_NUM_NODES directly since it is not a constant expression at compile time.

It's not a compile time constant on all platforms but it is static. I wonder if we can static init the length struct to MAX_NUM_NODES / 10

It needs to be compile-time constant to set the size of a static array. Alternatively, I could use a vector as I first proposed. Then I could reserve() the size in CryptoEngine during construction.

thebentern · 2025-05-13T18:37:29Z

It needs to be compile-time constant to set the size of a static array. Alternatively, I could use a vector as I first proposed. Then I could reserve() the size in CryptoEngine during construction.

I would think a vector approach would be fine, in light of the upstream changes with MAX_NUM_NODES

jasonbcox · 2025-05-13T18:40:20Z

I wasn't aware changes to MAX_NUM_NODES were coming down the pipe. Can you link the most relevant one?

thebentern · 2025-05-13T19:53:05Z

I wasn't aware changes to MAX_NUM_NODES were coming down the pipe. Can you link the most relevant one?

Sorry, I meant upstream prior to your initial PR. The latest change has them reliant on some runtime checks on espressif hardware.

jp-bennett · 2026-04-09T12:25:30Z

The downside of the stale bot is that good PRs like this one get dropped.

jp-bennett · 2026-04-09T12:50:11Z

The way this is currently written, we're only talking a max of 380ish bytes, right? Playing with this patch for some higher-throughput DM experimentation, and it seems to be working as expected so far.

thebentern requested review from GUVWAF, caveman99, Copilot and jp-bennett April 10, 2025 17:47

Copilot AI reviewed Apr 10, 2025

View reviewed changes

Comment thread src/mesh/CryptoEngine.cpp Outdated

jasonbcox added 7 commits May 13, 2025 08:51

Cleanup unnecessary member dereferencing in CryptoEngine

f1b892c

Add shared secret cache

c211384

Add some basic test coverage for the shared secret cache

fa17273

Use the shared secret cache in PKC ops

c7e44a2

Update oldestDelta during cache eviction

d9dc4b7

Set cache set depending on architecture

657eb93

Use static array instead of unordered_map to keep memory contiguous

66560fb

jasonbcox force-pushed the shared-secret-cache branch from e283834 to 66560fb Compare May 13, 2025 17:52

thebentern requested a review from Copilot May 13, 2025 18:08

Copilot AI reviewed May 13, 2025

View reviewed changes

Comment thread src/mesh/CryptoEngine.cpp Outdated

Comment thread src/mesh/CryptoEngine.h Outdated

jasonbcox added 2 commits May 13, 2025 11:20

Fix inadvertent overwrite

cf1e1e5

Remove unused include

f5898e0

Merge branch 'master' into shared-secret-cache

917b6d0

github-actions Bot added the Stale Issues that will be closed if not triaged. label Jun 29, 2025

github-actions Bot closed this Jul 6, 2025

jasonbcox mentioned this pull request Nov 10, 2025

Cleanup unnecessary global dereferencing in CryptoEngine #8611

Merged

8 tasks

jp-bennett reopened this Apr 9, 2026

Merge branch 'master' into shared-secret-cache

af17c3d

jp-bennett added enhancement New feature or request and removed Stale Issues that will be closed if not triaged. labels Apr 9, 2026

jp-bennett approved these changes Apr 9, 2026

View reviewed changes

jp-bennett force-pushed the master branch from 68e8b9c to 41f5317 Compare May 4, 2026 23:08

Uh oh!

Uh oh!

Conversation

jasonbcox commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤝 Attestations

Uh oh!

jasonbcox commented Apr 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jp-bennett commented Apr 10, 2025

Uh oh!

GUVWAF commented Apr 11, 2025

Uh oh!

thebentern commented Apr 11, 2025

Uh oh!

jp-bennett commented Apr 11, 2025

Uh oh!

jasonbcox commented Apr 11, 2025

Uh oh!

GUVWAF commented Apr 12, 2025

Uh oh!

jasonbcox commented May 13, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

thebentern commented May 13, 2025

Uh oh!

jasonbcox commented May 13, 2025

Uh oh!

thebentern commented May 13, 2025

Uh oh!

jasonbcox commented May 13, 2025

Uh oh!

thebentern commented May 13, 2025

Uh oh!

jp-bennett commented Apr 9, 2026

Uh oh!

jp-bennett commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jasonbcox commented Apr 10, 2025 •

edited

Loading