GCM Speed + Memory Mgmt. Improvements by adrianosela · Pull Request #783 · pion/dtls

adrianosela · 2026-01-22T01:46:35Z

GCM Speed + Memory Mgmt. Improvements

Pretty significant results by doing some minor tweaks:

Pre-calculating the exact output size and having Seal() write encrypted data directly into the final buffer eliminates an intermediate allocation and copy operation.

Improvements Summary

Encrypt latency: -3% to -23% (scales with payload size)
Encrypt throughput: +3% to +31% (up to 2.87 GB/s at 8KB)
Memory (bytes): -20% to -33% reduction
Memory (allocations): -1 alloc (20% reduction: 5 to 4)

Raw benchmark output before and after

Before

17:31 $ go test -tags=bench -bench=GCM* -benchmem
goos: darwin
goarch: arm64
pkg: github.com/pion/dtls/v3/pkg/crypto/ciphersuite
cpu: Apple M1 Pro
BenchmarkGCMEncrypt/016B-8         	 5606710	       198.0 ns/op	  80.82 MB/s	     160 B/op	       5 allocs/op
BenchmarkGCMEncrypt/064B-8         	 5053713	       234.7 ns/op	 272.65 MB/s	     304 B/op	       5 allocs/op
BenchmarkGCMEncrypt/128B-8         	 4941087	       242.1 ns/op	 528.73 MB/s	     496 B/op	       5 allocs/op
BenchmarkGCMEncrypt/256B-8         	 3939876	       303.1 ns/op	 844.53 MB/s	     928 B/op	       5 allocs/op
BenchmarkGCMEncrypt/512B-8         	 2817981	       432.0 ns/op	1185.22 MB/s	    1760 B/op	       5 allocs/op
BenchmarkGCMEncrypt/800B-8         	 2092272	       578.5 ns/op	1382.93 MB/s	    2720 B/op	       5 allocs/op
BenchmarkGCMEncrypt/1KB-8          	 1740795	       699.2 ns/op	1464.49 MB/s	    3488 B/op	       5 allocs/op
BenchmarkGCMEncrypt/1.1KB-8        	 1500366	       784.1 ns/op	1530.46 MB/s	    3872 B/op	       5 allocs/op
BenchmarkGCMEncrypt/1.4KB-8        	 1290439	       935.9 ns/op	1602.67 MB/s	    4896 B/op	       5 allocs/op
BenchmarkGCMEncrypt/4KB-8          	  539301	      2092 ns/op	1958.26 MB/s	   14624 B/op	       5 allocs/op
BenchmarkGCMEncrypt/8KB-8          	  307556	      3725 ns/op	2199.06 MB/s	   28448 B/op	       5 allocs/op
BenchmarkGCMDecrypt/016B-8         	12943551	        93.27 ns/op	 171.54 MB/s	      96 B/op	       3 allocs/op
BenchmarkGCMDecrypt/064B-8         	 9609406	       124.4 ns/op	 514.38 MB/s	     144 B/op	       3 allocs/op
BenchmarkGCMDecrypt/128B-8         	10685313	       109.7 ns/op	1167.32 MB/s	     208 B/op	       3 allocs/op
BenchmarkGCMDecrypt/256B-8         	 8475930	       139.5 ns/op	1835.72 MB/s	     352 B/op	       3 allocs/op
BenchmarkGCMDecrypt/512B-8         	 5819467	       206.4 ns/op	2480.92 MB/s	     608 B/op	       3 allocs/op
BenchmarkGCMDecrypt/800B-8         	 3986389	       294.0 ns/op	2720.75 MB/s	     928 B/op	       3 allocs/op
BenchmarkGCMDecrypt/1KB-8          	 3595134	       341.3 ns/op	3000.47 MB/s	    1184 B/op	       3 allocs/op
BenchmarkGCMDecrypt/1.1KB-8        	 2725264	       406.4 ns/op	2952.90 MB/s	    1312 B/op	       3 allocs/op
BenchmarkGCMDecrypt/1.4KB-8        	 2425977	       499.5 ns/op	3002.77 MB/s	    1824 B/op	       3 allocs/op
BenchmarkGCMDecrypt/4KB-8          	 1000000	      1104 ns/op	3710.43 MB/s	    4896 B/op	       3 allocs/op
BenchmarkGCMDecrypt/8KB-8          	  544194	      2054 ns/op	3989.28 MB/s	    9504 B/op	       3 allocs/op
PASS
ok  	github.com/pion/dtls/v3/pkg/crypto/ciphersuite	32.967s

After

17:32 $ go test -tags=bench -bench=GCM* -benchmem
goos: darwin
goarch: arm64
pkg: github.com/pion/dtls/v3/pkg/crypto/ciphersuite
cpu: Apple M1 Pro
BenchmarkGCMEncrypt/016B-8         	 5811658	       192.0 ns/op	  83.33 MB/s	     128 B/op	       4 allocs/op
BenchmarkGCMEncrypt/064B-8         	 5246335	       232.5 ns/op	 275.28 MB/s	     224 B/op	       4 allocs/op
BenchmarkGCMEncrypt/128B-8         	 5316270	       231.1 ns/op	 553.85 MB/s	     352 B/op	       4 allocs/op
BenchmarkGCMEncrypt/256B-8         	 4403444	       264.1 ns/op	 969.35 MB/s	     640 B/op	       4 allocs/op
BenchmarkGCMEncrypt/512B-8         	 3446043	       367.4 ns/op	1393.75 MB/s	    1184 B/op	       4 allocs/op
BenchmarkGCMEncrypt/800B-8         	 2444048	       484.9 ns/op	1649.68 MB/s	    1824 B/op	       4 allocs/op
BenchmarkGCMEncrypt/1KB-8          	 2029910	       549.0 ns/op	1865.24 MB/s	    2336 B/op	       4 allocs/op
BenchmarkGCMEncrypt/1.1KB-8        	 1972077	       634.7 ns/op	1890.72 MB/s	    2592 B/op	       4 allocs/op
BenchmarkGCMEncrypt/1.4KB-8        	 1605146	       738.7 ns/op	2030.55 MB/s	    3360 B/op	       4 allocs/op
BenchmarkGCMEncrypt/4KB-8          	  697569	      1607 ns/op	2548.82 MB/s	    9760 B/op	       4 allocs/op
BenchmarkGCMEncrypt/8KB-8          	  371834	      2853 ns/op	2871.83 MB/s	   18976 B/op	       4 allocs/op
BenchmarkGCMDecrypt/016B-8         	12969093	        91.70 ns/op	 174.49 MB/s	      96 B/op	       3 allocs/op
BenchmarkGCMDecrypt/064B-8         	 9518325	       124.3 ns/op	 514.89 MB/s	     144 B/op	       3 allocs/op
BenchmarkGCMDecrypt/128B-8         	10690188	       109.9 ns/op	1164.48 MB/s	     208 B/op	       3 allocs/op
BenchmarkGCMDecrypt/256B-8         	 8382956	       141.3 ns/op	1812.00 MB/s	     352 B/op	       3 allocs/op
BenchmarkGCMDecrypt/512B-8         	 5710416	       207.5 ns/op	2467.10 MB/s	     608 B/op	       3 allocs/op
BenchmarkGCMDecrypt/800B-8         	 4093116	       296.6 ns/op	2697.07 MB/s	     928 B/op	       3 allocs/op
BenchmarkGCMDecrypt/1KB-8          	 3458967	       341.8 ns/op	2996.09 MB/s	    1184 B/op	       3 allocs/op
BenchmarkGCMDecrypt/1.1KB-8        	 2837096	       403.1 ns/op	2976.92 MB/s	    1312 B/op	       3 allocs/op
BenchmarkGCMDecrypt/1.4KB-8        	 2384750	       511.9 ns/op	2929.98 MB/s	    1824 B/op	       3 allocs/op
BenchmarkGCMDecrypt/4KB-8          	 1000000	      1095 ns/op	3741.14 MB/s	    4896 B/op	       3 allocs/op
BenchmarkGCMDecrypt/8KB-8          	  549505	      2082 ns/op	3935.15 MB/s	    9504 B/op	       3 allocs/op
PASS
ok  	github.com/pion/dtls/v3/pkg/crypto/ciphersuite	32.424s

Encrypt latency: -3% to -23% (scales with payload size) Encrypt throughput: +3% to +31% (up to 2.87 GB/s at 8KB) Memory (bytes): -20% to -33% reduction Memory (allocations): -1 alloc (20% reduction: 5 to 4)

codecov · 2026-01-22T01:49:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.65%. Comparing base (199a753) to head (7a5c8de).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #783      +/-   ##
==========================================
+ Coverage   81.63%   81.65%   +0.02%     
==========================================
  Files         105      105              
  Lines        5837     5838       +1     
==========================================
+ Hits         4765     4767       +2     
+ Misses        688      687       -1     
  Partials      384      384

Flag	Coverage Δ
go	`81.65% <100.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sean-Der · 2026-01-22T04:37:46Z

pkg/crypto/ciphersuite/gcm.go

 	}
-	encryptedPayload := g.localGCM.Seal(nil, nonce, payload, additionalData)
-	r := make([]byte, len(raw)+len(nonce[4:])+len(encryptedPayload))
+	finalSize := len(raw) + 8 + len(payload) + gcmTagLength


Hot dog nice change!

Would it be safer/easier to read to do make([]byte, 0, finalSize) and then do appends?

If we do get math wrong it won't crash? What's your take?

Hey, it would be more readable to do what you propose e.g.

- r := make([]byte, finalSize) - copy(r, raw) - copy(r[len(raw):], nonce[4:]) - - g.localGCM.Seal(r[len(raw)+8:len(raw)+8], nonce, payload, additionalData) + r := make([]byte, 0, finalSize) + r = append(r, raw...) + r = append(r, nonce[4:]...) + r = g.localGCM.Seal(r, nonce, payload, additionalData)

And yes, the current code will panic if we get the math wrong, your proposed change wouldn't, but all values are known at this point and the math is straightforward... so I wouldn't say its inherently safer.

I tested the change and it introduces a 4% regression at 8KB (I think because append is slower than copy for larger buffers). I'm leaning towards not making the change due to this performance hit, otherwise the readability benefit alone makes it worth it. What do you think?

👍 100% agree.

It also is better probably to crash vs have unexpected behavior and who knows what the impact could be down the line.

I am super impressived/appreciative of these things. Keep doing great stuff inspiring me to get back into it :)

GCM Speed + Memory Mgmt Improvements

7a5c8de

Encrypt latency: -3% to -23% (scales with payload size) Encrypt throughput: +3% to +31% (up to 2.87 GB/s at 8KB) Memory (bytes): -20% to -33% reduction Memory (allocations): -1 alloc (20% reduction: 5 to 4)

adrianosela requested review from JoTurk, Sean-Der and theodorsm January 22, 2026 01:46

Sean-Der reviewed Jan 22, 2026

View reviewed changes

Sean-Der approved these changes Jan 22, 2026

View reviewed changes

adrianosela merged commit 100a3aa into pion:master Jan 22, 2026
19 checks passed

adrianosela deleted the ciphersuite-perf branch January 22, 2026 08:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCM Speed + Memory Mgmt. Improvements#783

GCM Speed + Memory Mgmt. Improvements#783
adrianosela merged 1 commit intopion:masterfrom
adrianosela:ciphersuite-perf

adrianosela commented Jan 22, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

Sean-Der Jan 22, 2026

Uh oh!

adrianosela Jan 22, 2026

Uh oh!

Sean-Der Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adrianosela commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!