Skip to content

Improve performance of encrypted packets with a shared secret cache#6560

Open
jasonbcox wants to merge 11 commits into
meshtastic:masterfrom
jasonbcox:shared-secret-cache
Open

Improve performance of encrypted packets with a shared secret cache#6560
jasonbcox wants to merge 11 commits into
meshtastic:masterfrom
jasonbcox:shared-secret-cache

Conversation

@jasonbcox

@jasonbcox jasonbcox commented Apr 10, 2025

Copy link
Copy Markdown
Contributor

Add a shared secret cache to dramatically improve performance and battery usage of encrypted communications after the first packet is sent between two nodes.

Using this additional patch jasonbcox@190444f, I logged the following on a RAK4631 when sending DMs between it and another node:
On master:

  • PKC encryption/decryption: ~96ms
  • Packet routing: ~107ms
  • All other modules, total: <5ms
  • Total packet handling time: 210 - 220ms

With this patch applied, first encrypted packet is same as above. All subsequent packets between the same two nodes perform as follows (times are roughly the same regardless of message length):

  • PKC encryption/decryption: ~3ms
  • Other modules (routing, etc): same as above
  • Total packet handling time: ~120ms

This is a 32x performance improvement of encrypted packet decoding at the cost of (up to) ~2.4kb of RAM.

For users that send DMs often, and assuming that every ms of compute is energy cost equivalent (its not, but we can estimate), this could provide around 45% improvement in battery usage with respect to encrypted communications.

Unit test coverage recently improved to ensure this patch does not regress, in addition to tests added in this PR (see #6485)

Fixes #6426

🤝 Attestations

  • I have tested that my proposed changes behave as described.
  • I have tested that my proposed changes do not cause any obvious regressions on the following devices:
    • Heltec (Lora32) V3
    • LilyGo T-Deck
    • LilyGo T-Beam
    • RAK WisBlock 4631
    • Seeed Studio T-1000E tracker card
    • Other (please specify below)

@jasonbcox

Copy link
Copy Markdown
Contributor Author

I will continue to run this patch on both of my devices throughout the review period. If anyone could test on LilyGo or Heltec devices, that would be appreciated.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

test/test_crypto/test_main.cpp:140

  • [nitpick] Implementing a mechanism to mock millis() in the tests would allow simulation of delayed cache expiration and better verification of the cache update logic.
// FIXME If tests could mock the millis() time, it would be ideal to mock the time forward by a

Comment thread src/mesh/CryptoEngine.cpp Outdated
@jp-bennett

Copy link
Copy Markdown
Collaborator

I definitely approve in principle. Since it's touching the crypto code, it needs more detailed review than I can give tonight.

@GUVWAF

GUVWAF commented Apr 11, 2025

Copy link
Copy Markdown
Member

I really appreciate the work you put into this, but I think we should also be realistic about the actual gains of this versus the drawback. 2.4kBytes of RAM is not insignificant, that’s 12 nodes in the NodeDB and ~1% of the total RAM available for nRF52.

Regarding power consumption: I think if a node receives 100 messages directed towards it per day that would be a lot already. In this case we thus save 9.3 seconds of compute per day, which is only 0.01% of the time. However, for nodes connected to a client (it’s those nodes that you’re sending a DM to) we’re not going into a real sleep mode, so we’ll just be longer in mainDelay.delay(). How much this saves on energy depends on how the idle state is implemented on each platform, but e.g. the Arduino core for ESP32 doesn’t use power management (CONFIG_PM_ENABLE is not set).

Even if the power consumption of the CPU could be completely eliminated during 0.01% of the time, it should be considered relative to the LoRa radio that is always in receiving mode consuming about ~4.2mA for SX126x. If we would be able to mitigate only one packet transmission per day, which takes ~1 second on LongFast during which it consumes ~125mA more, I’d argue that’s already more than can be gained from this.

@thebentern

Copy link
Copy Markdown
Contributor

I really appreciate the work you put into this, but I think we should also be realistic about the actual gains of this versus the drawback. 2.4kBytes of RAM is not insignificant, that’s 12 nodes in the NodeDB and ~1% of the total RAM available for nRF52.

Regarding power consumption: I think if a node receives 100 messages directed towards it per day that would be a lot already. In this case we thus save 9.3 seconds of compute per day, which is only 0.01% of the time. However, for nodes connected to a client (it’s those nodes that you’re sending a DM to) we’re not going into a real sleep mode, so we’ll just be longer in mainDelay.delay(). How much this saves on energy depends on how the idle state is implemented on each platform, but e.g. the Arduino core for ESP32 doesn’t use power management (CONFIG_PM_ENABLE is not set).

Even if the power consumption of the CPU could be completely eliminated during 0.01% of the time, it should be considered relative to the LoRa radio that is always in receiving mode consuming about ~4.2mA for SX126x. If we would be able to mitigate only one packet transmission per day, which takes ~1 second on LongFast during which it consumes ~125mA more, I’d argue that’s already more than can be gained from this.

Does it make sense to utilize this only on non-NRF platforms? I was actually thinking the main benefit of this would be significantly increased throughput for network connected nodes instead of improved power consumption, which as you pointed out is insignificant

@jp-bennett

Copy link
Copy Markdown
Collaborator

Does it make sense to utilize this only on non-NRF platforms? I was actually thinking the main benefit of this would be significantly increased throughput for network connected nodes instead of improved power consumption, which as you pointed out is insignificant

We can set the size much smaller on the NRF platform. 64 slots is quite a few. I'd set it to 8 on NRF.

Not entirely related, I do wonder if using a dynamic object like the unordered_map will lead to memory fragmentation.

@jasonbcox

Copy link
Copy Markdown
Contributor Author

It wasn't clear to me how to best profile the RAM usage. Considering it's much tighter than I thought for NRF, it makes sense to both limit to 8 slots for NRF and change the implementation to use a vector so the memory is contiguous. It's not expensive to do O(n) lookups for such a small number of elements.

And considering @GUVWAF's point, I could reduce the default number of slots to 32 or even 24 and I think the cache would perform nearly as well for the vast majority of users.

@GUVWAF

GUVWAF commented Apr 12, 2025

Copy link
Copy Markdown
Member

I was actually thinking the main benefit of this would be significantly increased throughput for network connected nodes instead of improved power consumption

Still I think before the decryption process becomes the bottleneck in terms of throughput you'll need to receive a lot of direct messages after each other via MQTT or UDP.

I could reduce the default number of slots to 32 or even 24 and I think the cache would perform nearly as well for the vast majority of users.

Maybe make it dependent on MAX_NUM_NODES?

@jasonbcox jasonbcox force-pushed the shared-secret-cache branch from e283834 to 66560fb Compare May 13, 2025 17:52
@jasonbcox

Copy link
Copy Markdown
Contributor Author

Full disclosure: I haven't tested the latest changes on devices yet, but looking for more feedback on the concept.

I decreased the cache size substantially, especially for boards that have very small memory. The default of 10 puts the cache usage under 400 bytes. On NRF with 8 slots, about 300 bytes.

I could not make it dependent on MAX_NUM_NODES directly since it is not a constant expression at compile time.

@thebentern thebentern requested a review from Copilot May 13, 2025 18:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a shared-secret cache to the CryptoEngine to speed up repeated encrypted packet handling by avoiding costly DH operations after the first exchange.

  • Introduces a fixed-size sharedSecretCache and the setCryptoSharedSecret method to manage lookup, insertion, and eviction of secrets.
  • Updates encryption/decryption paths to use the cache.
  • Adds unit tests to validate cache hits and basic insertion behavior.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
test/test_crypto/test_main.cpp Added TestCryptoEngine helper and tests for cache insertion and hits.
src/mesh/CryptoEngine.h Defined CachedSharedSecret, MAX_CACHED_SHARED_SECRETS, declared setCryptoSharedSecret, and added an unused <unordered_map> include.
src/mesh/CryptoEngine.cpp Wired encryption/decryption to setCryptoSharedSecret and implemented the cache lookup/insert logic (with an unintended assignment bug).
Comments suppressed due to low confidence (1)

test/test_crypto/test_main.cpp:140

  • The current tests verify insertion and hit behavior but do not cover eviction when the cache exceeds MAX_CACHED_SHARED_SECRETS. Consider adding a test that fills the cache and ensures the oldest entry is evicted as expected.
// Calling again should fetch from the cache. Shared secret is the same.

Comment thread src/mesh/CryptoEngine.cpp Outdated
Comment thread src/mesh/CryptoEngine.h Outdated
@thebentern

Copy link
Copy Markdown
Contributor

I could not make it dependent on MAX_NUM_NODES directly since it is not a constant expression at compile time.

It's not a compile time constant on all platforms but it is static. I wonder if we can static init the length struct to MAX_NUM_NODES / 10 ?
image

@jasonbcox

Copy link
Copy Markdown
Contributor Author

I could not make it dependent on MAX_NUM_NODES directly since it is not a constant expression at compile time.

It's not a compile time constant on all platforms but it is static. I wonder if we can static init the length struct to MAX_NUM_NODES / 10

It needs to be compile-time constant to set the size of a static array. Alternatively, I could use a vector as I first proposed. Then I could reserve() the size in CryptoEngine during construction.

@thebentern

Copy link
Copy Markdown
Contributor

It needs to be compile-time constant to set the size of a static array. Alternatively, I could use a vector as I first proposed. Then I could reserve() the size in CryptoEngine during construction.

I would think a vector approach would be fine, in light of the upstream changes with MAX_NUM_NODES

@jasonbcox

Copy link
Copy Markdown
Contributor Author

I wasn't aware changes to MAX_NUM_NODES were coming down the pipe. Can you link the most relevant one?

@thebentern

Copy link
Copy Markdown
Contributor

I wasn't aware changes to MAX_NUM_NODES were coming down the pipe. Can you link the most relevant one?

Sorry, I meant upstream prior to your initial PR. The latest change has them reliant on some runtime checks on espressif hardware.

@github-actions github-actions Bot added the Stale Issues that will be closed if not triaged. label Jun 29, 2025
@github-actions github-actions Bot closed this Jul 6, 2025
@jp-bennett jp-bennett reopened this Apr 9, 2026
@jp-bennett

Copy link
Copy Markdown
Collaborator

The downside of the stale bot is that good PRs like this one get dropped.

@jp-bennett jp-bennett added enhancement New feature or request and removed Stale Issues that will be closed if not triaged. labels Apr 9, 2026
@jp-bennett

Copy link
Copy Markdown
Collaborator

The way this is currently written, we're only talking a max of 380ish bytes, right? Playing with this patch for some higher-throughput DM experimentation, and it seems to be working as expected so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request]: Improve performance of encrypted packet usage

5 participants