Improve performance of encrypted packets with a shared secret cache#6560
Improve performance of encrypted packets with a shared secret cache#6560jasonbcox wants to merge 11 commits into
Conversation
|
I will continue to run this patch on both of my devices throughout the review period. If anyone could test on LilyGo or Heltec devices, that would be appreciated. |
There was a problem hiding this comment.
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
test/test_crypto/test_main.cpp:140
- [nitpick] Implementing a mechanism to mock millis() in the tests would allow simulation of delayed cache expiration and better verification of the cache update logic.
// FIXME If tests could mock the millis() time, it would be ideal to mock the time forward by a
|
I definitely approve in principle. Since it's touching the crypto code, it needs more detailed review than I can give tonight. |
|
I really appreciate the work you put into this, but I think we should also be realistic about the actual gains of this versus the drawback. 2.4kBytes of RAM is not insignificant, that’s 12 nodes in the NodeDB and ~1% of the total RAM available for nRF52. Regarding power consumption: I think if a node receives 100 messages directed towards it per day that would be a lot already. In this case we thus save 9.3 seconds of compute per day, which is only 0.01% of the time. However, for nodes connected to a client (it’s those nodes that you’re sending a DM to) we’re not going into a real sleep mode, so we’ll just be longer in Even if the power consumption of the CPU could be completely eliminated during 0.01% of the time, it should be considered relative to the LoRa radio that is always in receiving mode consuming about ~4.2mA for SX126x. If we would be able to mitigate only one packet transmission per day, which takes ~1 second on LongFast during which it consumes ~125mA more, I’d argue that’s already more than can be gained from this. |
Does it make sense to utilize this only on non-NRF platforms? I was actually thinking the main benefit of this would be significantly increased throughput for network connected nodes instead of improved power consumption, which as you pointed out is insignificant |
We can set the size much smaller on the NRF platform. 64 slots is quite a few. I'd set it to 8 on NRF. Not entirely related, I do wonder if using a dynamic object like the unordered_map will lead to memory fragmentation. |
|
It wasn't clear to me how to best profile the RAM usage. Considering it's much tighter than I thought for NRF, it makes sense to both limit to 8 slots for NRF and change the implementation to use a vector so the memory is contiguous. It's not expensive to do O(n) lookups for such a small number of elements. And considering @GUVWAF's point, I could reduce the default number of slots to 32 or even 24 and I think the cache would perform nearly as well for the vast majority of users. |
Still I think before the decryption process becomes the bottleneck in terms of throughput you'll need to receive a lot of direct messages after each other via MQTT or UDP.
Maybe make it dependent on |
e283834 to
66560fb
Compare
|
Full disclosure: I haven't tested the latest changes on devices yet, but looking for more feedback on the concept. I decreased the cache size substantially, especially for boards that have very small memory. The default of 10 puts the cache usage under 400 bytes. On NRF with 8 slots, about 300 bytes. I could not make it dependent on MAX_NUM_NODES directly since it is not a constant expression at compile time. |
There was a problem hiding this comment.
Pull Request Overview
This PR adds a shared-secret cache to the CryptoEngine to speed up repeated encrypted packet handling by avoiding costly DH operations after the first exchange.
- Introduces a fixed-size
sharedSecretCacheand thesetCryptoSharedSecretmethod to manage lookup, insertion, and eviction of secrets. - Updates encryption/decryption paths to use the cache.
- Adds unit tests to validate cache hits and basic insertion behavior.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| test/test_crypto/test_main.cpp | Added TestCryptoEngine helper and tests for cache insertion and hits. |
| src/mesh/CryptoEngine.h | Defined CachedSharedSecret, MAX_CACHED_SHARED_SECRETS, declared setCryptoSharedSecret, and added an unused <unordered_map> include. |
| src/mesh/CryptoEngine.cpp | Wired encryption/decryption to setCryptoSharedSecret and implemented the cache lookup/insert logic (with an unintended assignment bug). |
Comments suppressed due to low confidence (1)
test/test_crypto/test_main.cpp:140
- The current tests verify insertion and hit behavior but do not cover eviction when the cache exceeds
MAX_CACHED_SHARED_SECRETS. Consider adding a test that fills the cache and ensures the oldest entry is evicted as expected.
// Calling again should fetch from the cache. Shared secret is the same.
It needs to be compile-time constant to set the size of a static array. Alternatively, I could use a vector as I first proposed. Then I could reserve() the size in CryptoEngine during construction. |
I would think a vector approach would be fine, in light of the upstream changes with MAX_NUM_NODES |
|
I wasn't aware changes to MAX_NUM_NODES were coming down the pipe. Can you link the most relevant one? |
Sorry, I meant upstream prior to your initial PR. The latest change has them reliant on some runtime checks on espressif hardware. |
|
The downside of the stale bot is that good PRs like this one get dropped. |
|
The way this is currently written, we're only talking a max of 380ish bytes, right? Playing with this patch for some higher-throughput DM experimentation, and it seems to be working as expected so far. |

Add a shared secret cache to dramatically improve performance and battery usage of encrypted communications after the first packet is sent between two nodes.
Using this additional patch jasonbcox@190444f, I logged the following on a RAK4631 when sending DMs between it and another node:
On master:
With this patch applied, first encrypted packet is same as above. All subsequent packets between the same two nodes perform as follows (times are roughly the same regardless of message length):
This is a 32x performance improvement of encrypted packet decoding at the cost of (up to) ~2.4kb of RAM.
For users that send DMs often, and assuming that every ms of compute is energy cost equivalent (its not, but we can estimate), this could provide around 45% improvement in battery usage with respect to encrypted communications.
Unit test coverage recently improved to ensure this patch does not regress, in addition to tests added in this PR (see #6485)
Fixes #6426
🤝 Attestations