bpf: optmize builtin functions before we fallback to them#11089
bpf: optmize builtin functions before we fallback to them#11089
Conversation
|
Huh, apparently one of the very few |
013032a to
16e925f
Compare
tklauser
left a comment
There was a problem hiding this comment.
Small nit: s/optmize/optimize/ in commit message and PR title.
|
this PR has been marked as a draft PR since it had a WIP label. Please click in "Ready for review" [below vvv ] once the PR is ready to be reviewed. CI will still run for draft PRs. |
f038a99 to
9a3a578
Compare
9a3a578 to
5a1d08b
Compare
|
test-me-please |
5a1d08b to
da30f2f
Compare
|
test-me-please |
da30f2f to
83510c4
Compare
|
test-me-please |
83510c4 to
1909dd1
Compare
|
test-me-please |
1909dd1 to
8ffe63d
Compare
|
test-me-please |
pchaigno
left a comment
There was a problem hiding this comment.
This is great! One nit and a couple questions below.
Did you get a chance to check the complexity and program size impact? It looks like it could have a bigger impact than other mitigations we've discussed, and on all kernels 🎉
Convert all direct use of __builtin_mem{cpy,set}() over to the
regular mem{cpy,set}() function as we're going to have a custom
implementation of the latter two.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
While checking some of the xlated code from XDP side, I recently
noticed __builtin_memcpy() patterns like:
[...]
14: 55 04 ee 00 00 00 00 00 if r4 != 0 goto +238 <LBB0_2>
15: 71 32 2f 00 00 00 00 00 r2 = *(u8 *)(r3 + 47)
16: 67 02 00 00 08 00 00 00 r2 <<= 8
17: 71 31 2e 00 00 00 00 00 r1 = *(u8 *)(r3 + 46)
18: 4f 12 00 00 00 00 00 00 r2 |= r1
19: 7b 2a 78 ff 00 00 00 00 *(u64 *)(r10 - 136) = r2
20: 71 32 37 00 00 00 00 00 r2 = *(u8 *)(r3 + 55)
21: 67 02 00 00 08 00 00 00 r2 <<= 8
22: 71 31 36 00 00 00 00 00 r1 = *(u8 *)(r3 + 54)
23: 4f 12 00 00 00 00 00 00 r2 |= r1
24: 7b 2a b0 ff 00 00 00 00 *(u64 *)(r10 - 80) = r2
[...]
This is bad since we end up doing byte-wise copy of data. LLVM is
not aware of efficient unaligned access on x86-64 or arm64 etc so
it cannot make any assumptions on the access.
Implement optimized routines and make sure we don't blindly use
pain __builtin_*() directly in the code.
[...]
15: 79 31 38 00 00 00 00 00 r1 = *(u64 *)(r3 + 56)
16: 7b 1a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r1
17: 79 31 30 00 00 00 00 00 r1 = *(u64 *)(r3 + 48)
18: 7b 1a f0 ff 00 00 00 00 *(u64 *)(r10 - 16) = r1
19: 79 31 28 00 00 00 00 00 r1 = *(u64 *)(r3 + 40)
20: 7b 1a e8 ff 00 00 00 00 *(u64 *)(r10 - 24) = r1
21: 79 31 20 00 00 00 00 00 r1 = *(u64 *)(r3 + 32)
22: 7b 1a e0 ff 00 00 00 00 *(u64 *)(r10 - 32) = r1
23: 79 31 18 00 00 00 00 00 r1 = *(u64 *)(r3 + 24)
24: 7b 1a d8 ff 00 00 00 00 *(u64 *)(r10 - 40) = r1
[...]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
8ffe63d to
742047a
Compare
Place them into its own directory instead of directly into lib/ code. Adding more into lib/ would convolute it too much. Also add a Wno-builtin-declaration-mismatch for gcc. Tested via: # make unit-tests TESTPKGS= Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
test-me-please |
Add various new test cases for mem{cpy,set} adn run them as part
of the bpf unit tests. I've added barrier_data() from [0] to
prevent the compiler from any optimizations on the test data.
Tested via: # make unit-tests TESTPKGS=
[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7829fb09a2b4268b30dd9bc782fa5ebee278b137
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
fe9837d to
fa17377
Compare
|
test-me-please |
|
|
test-with-kernel |
|
restart-gke |
|
gke known hubble flake via #11141 (prior run before rebase was green on gke as well) |
|
|
See commit msg.