Skip to content

GCM Speed + Memory Mgmt. Improvements#783

Merged
adrianosela merged 1 commit intopion:masterfrom
adrianosela:ciphersuite-perf
Jan 22, 2026
Merged

GCM Speed + Memory Mgmt. Improvements#783
adrianosela merged 1 commit intopion:masterfrom
adrianosela:ciphersuite-perf

Conversation

@adrianosela
Copy link
Copy Markdown
Contributor

@adrianosela adrianosela commented Jan 22, 2026

GCM Speed + Memory Mgmt. Improvements

Pretty significant results by doing some minor tweaks:

Pre-calculating the exact output size and having Seal() write encrypted data directly into the final buffer eliminates an intermediate allocation and copy operation.

Improvements Summary

  • Encrypt latency: -3% to -23% (scales with payload size)
  • Encrypt throughput: +3% to +31% (up to 2.87 GB/s at 8KB)
  • Memory (bytes): -20% to -33% reduction
  • Memory (allocations): -1 alloc (20% reduction: 5 to 4)

Raw benchmark output before and after

Before

17:31 $ go test -tags=bench -bench=GCM* -benchmem
goos: darwin
goarch: arm64
pkg: github.com/pion/dtls/v3/pkg/crypto/ciphersuite
cpu: Apple M1 Pro
BenchmarkGCMEncrypt/016B-8         	 5606710	       198.0 ns/op	  80.82 MB/s	     160 B/op	       5 allocs/op
BenchmarkGCMEncrypt/064B-8         	 5053713	       234.7 ns/op	 272.65 MB/s	     304 B/op	       5 allocs/op
BenchmarkGCMEncrypt/128B-8         	 4941087	       242.1 ns/op	 528.73 MB/s	     496 B/op	       5 allocs/op
BenchmarkGCMEncrypt/256B-8         	 3939876	       303.1 ns/op	 844.53 MB/s	     928 B/op	       5 allocs/op
BenchmarkGCMEncrypt/512B-8         	 2817981	       432.0 ns/op	1185.22 MB/s	    1760 B/op	       5 allocs/op
BenchmarkGCMEncrypt/800B-8         	 2092272	       578.5 ns/op	1382.93 MB/s	    2720 B/op	       5 allocs/op
BenchmarkGCMEncrypt/1KB-8          	 1740795	       699.2 ns/op	1464.49 MB/s	    3488 B/op	       5 allocs/op
BenchmarkGCMEncrypt/1.1KB-8        	 1500366	       784.1 ns/op	1530.46 MB/s	    3872 B/op	       5 allocs/op
BenchmarkGCMEncrypt/1.4KB-8        	 1290439	       935.9 ns/op	1602.67 MB/s	    4896 B/op	       5 allocs/op
BenchmarkGCMEncrypt/4KB-8          	  539301	      2092 ns/op	1958.26 MB/s	   14624 B/op	       5 allocs/op
BenchmarkGCMEncrypt/8KB-8          	  307556	      3725 ns/op	2199.06 MB/s	   28448 B/op	       5 allocs/op
BenchmarkGCMDecrypt/016B-8         	12943551	        93.27 ns/op	 171.54 MB/s	      96 B/op	       3 allocs/op
BenchmarkGCMDecrypt/064B-8         	 9609406	       124.4 ns/op	 514.38 MB/s	     144 B/op	       3 allocs/op
BenchmarkGCMDecrypt/128B-8         	10685313	       109.7 ns/op	1167.32 MB/s	     208 B/op	       3 allocs/op
BenchmarkGCMDecrypt/256B-8         	 8475930	       139.5 ns/op	1835.72 MB/s	     352 B/op	       3 allocs/op
BenchmarkGCMDecrypt/512B-8         	 5819467	       206.4 ns/op	2480.92 MB/s	     608 B/op	       3 allocs/op
BenchmarkGCMDecrypt/800B-8         	 3986389	       294.0 ns/op	2720.75 MB/s	     928 B/op	       3 allocs/op
BenchmarkGCMDecrypt/1KB-8          	 3595134	       341.3 ns/op	3000.47 MB/s	    1184 B/op	       3 allocs/op
BenchmarkGCMDecrypt/1.1KB-8        	 2725264	       406.4 ns/op	2952.90 MB/s	    1312 B/op	       3 allocs/op
BenchmarkGCMDecrypt/1.4KB-8        	 2425977	       499.5 ns/op	3002.77 MB/s	    1824 B/op	       3 allocs/op
BenchmarkGCMDecrypt/4KB-8          	 1000000	      1104 ns/op	3710.43 MB/s	    4896 B/op	       3 allocs/op
BenchmarkGCMDecrypt/8KB-8          	  544194	      2054 ns/op	3989.28 MB/s	    9504 B/op	       3 allocs/op
PASS
ok  	github.com/pion/dtls/v3/pkg/crypto/ciphersuite	32.967s

After

17:32 $ go test -tags=bench -bench=GCM* -benchmem
goos: darwin
goarch: arm64
pkg: github.com/pion/dtls/v3/pkg/crypto/ciphersuite
cpu: Apple M1 Pro
BenchmarkGCMEncrypt/016B-8         	 5811658	       192.0 ns/op	  83.33 MB/s	     128 B/op	       4 allocs/op
BenchmarkGCMEncrypt/064B-8         	 5246335	       232.5 ns/op	 275.28 MB/s	     224 B/op	       4 allocs/op
BenchmarkGCMEncrypt/128B-8         	 5316270	       231.1 ns/op	 553.85 MB/s	     352 B/op	       4 allocs/op
BenchmarkGCMEncrypt/256B-8         	 4403444	       264.1 ns/op	 969.35 MB/s	     640 B/op	       4 allocs/op
BenchmarkGCMEncrypt/512B-8         	 3446043	       367.4 ns/op	1393.75 MB/s	    1184 B/op	       4 allocs/op
BenchmarkGCMEncrypt/800B-8         	 2444048	       484.9 ns/op	1649.68 MB/s	    1824 B/op	       4 allocs/op
BenchmarkGCMEncrypt/1KB-8          	 2029910	       549.0 ns/op	1865.24 MB/s	    2336 B/op	       4 allocs/op
BenchmarkGCMEncrypt/1.1KB-8        	 1972077	       634.7 ns/op	1890.72 MB/s	    2592 B/op	       4 allocs/op
BenchmarkGCMEncrypt/1.4KB-8        	 1605146	       738.7 ns/op	2030.55 MB/s	    3360 B/op	       4 allocs/op
BenchmarkGCMEncrypt/4KB-8          	  697569	      1607 ns/op	2548.82 MB/s	    9760 B/op	       4 allocs/op
BenchmarkGCMEncrypt/8KB-8          	  371834	      2853 ns/op	2871.83 MB/s	   18976 B/op	       4 allocs/op
BenchmarkGCMDecrypt/016B-8         	12969093	        91.70 ns/op	 174.49 MB/s	      96 B/op	       3 allocs/op
BenchmarkGCMDecrypt/064B-8         	 9518325	       124.3 ns/op	 514.89 MB/s	     144 B/op	       3 allocs/op
BenchmarkGCMDecrypt/128B-8         	10690188	       109.9 ns/op	1164.48 MB/s	     208 B/op	       3 allocs/op
BenchmarkGCMDecrypt/256B-8         	 8382956	       141.3 ns/op	1812.00 MB/s	     352 B/op	       3 allocs/op
BenchmarkGCMDecrypt/512B-8         	 5710416	       207.5 ns/op	2467.10 MB/s	     608 B/op	       3 allocs/op
BenchmarkGCMDecrypt/800B-8         	 4093116	       296.6 ns/op	2697.07 MB/s	     928 B/op	       3 allocs/op
BenchmarkGCMDecrypt/1KB-8          	 3458967	       341.8 ns/op	2996.09 MB/s	    1184 B/op	       3 allocs/op
BenchmarkGCMDecrypt/1.1KB-8        	 2837096	       403.1 ns/op	2976.92 MB/s	    1312 B/op	       3 allocs/op
BenchmarkGCMDecrypt/1.4KB-8        	 2384750	       511.9 ns/op	2929.98 MB/s	    1824 B/op	       3 allocs/op
BenchmarkGCMDecrypt/4KB-8          	 1000000	      1095 ns/op	3741.14 MB/s	    4896 B/op	       3 allocs/op
BenchmarkGCMDecrypt/8KB-8          	  549505	      2082 ns/op	3935.15 MB/s	    9504 B/op	       3 allocs/op
PASS
ok  	github.com/pion/dtls/v3/pkg/crypto/ciphersuite	32.424s

Encrypt latency: -3% to -23% (scales with payload size)
Encrypt throughput: +3% to +31% (up to 2.87 GB/s at 8KB)
Memory (bytes): -20% to -33% reduction
Memory (allocations): -1 alloc (20% reduction: 5 to 4)
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.65%. Comparing base (199a753) to head (7a5c8de).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #783      +/-   ##
==========================================
+ Coverage   81.63%   81.65%   +0.02%     
==========================================
  Files         105      105              
  Lines        5837     5838       +1     
==========================================
+ Hits         4765     4767       +2     
+ Misses        688      687       -1     
  Partials      384      384              
Flag Coverage Δ
go 81.65% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

}
encryptedPayload := g.localGCM.Seal(nil, nonce, payload, additionalData)
r := make([]byte, len(raw)+len(nonce[4:])+len(encryptedPayload))
finalSize := len(raw) + 8 + len(payload) + gcmTagLength
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hot dog nice change!

Would it be safer/easier to read to do make([]byte, 0, finalSize) and then do appends?

If we do get math wrong it won't crash? What's your take?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, it would be more readable to do what you propose e.g.

-       r := make([]byte, finalSize)
-       copy(r, raw)
-       copy(r[len(raw):], nonce[4:])
-
-       g.localGCM.Seal(r[len(raw)+8:len(raw)+8], nonce, payload, additionalData)
+       r := make([]byte, 0, finalSize)
+       r = append(r, raw...)
+       r = append(r, nonce[4:]...)
+       r = g.localGCM.Seal(r, nonce, payload, additionalData)

And yes, the current code will panic if we get the math wrong, your proposed change wouldn't, but all values are known at this point and the math is straightforward... so I wouldn't say its inherently safer.

I tested the change and it introduces a 4% regression at 8KB (I think because append is slower than copy for larger buffers). I'm leaning towards not making the change due to this performance hit, otherwise the readability benefit alone makes it worth it. What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 100% agree.

It also is better probably to crash vs have unexpected behavior and who knows what the impact could be down the line.

I am super impressived/appreciative of these things. Keep doing great stuff inspiring me to get back into it :)

@adrianosela adrianosela merged commit 100a3aa into pion:master Jan 22, 2026
19 checks passed
@adrianosela adrianosela deleted the ciphersuite-perf branch January 22, 2026 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants