-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
bpf-firewall: custom BPF programs through IP(Ingress|Egress)FilterPath= #12419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
One more note about boolean arguments: How would they work with |
poettering
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good. a number of comments though
I think so, yes. I mean, the reason the IPAddressAllow= and IPAddressDeny= lines are gracefully skipped is that we want people to use them by default in upstream packaging (for example to limit traffic to localhost), and things shall still work when these unit files are used in environments where bpf is not available (because the kernel doesn't do it, or running in a container or such). But since these kind of manual bpf programs don't make much sense in similar upstream scenarios, since they always need manual setup, i would say these ones should be fatal by default, if they can't be installed. |
db90ec5 to
1f0f3e7
Compare
|
Thank you for the review and good hints! I have addressed the issues and think it is in a better state now. Actually I removed the entry in I wanted to introduce error handling for |
The idea is to get the tests that are skipped in docker containers to run in VMs to catch more issues. For example, there is a memory leak in systemd#12419 that can't be caugth because test-bpf isn't run there under ASan/UBsan: ``` ==11144== HEAP SUMMARY: ==11144== in use at exit: 16,425 bytes in 5 blocks ==11144== total heap usage: 7,265 allocs, 7,260 frees, 2,659,529 bytes allocated ==11144== ==11144== 41 (16 direct, 25 indirect) bytes in 1 blocks are definitely lost in loss record 2 of 5 ==11144== at 0x4838748: malloc (vg_replace_malloc.c:308) ==11144== by 0x483AD63: realloc (vg_replace_malloc.c:836) ==11144== by 0x49897B4: strv_push (strv.c:398) ==11144== by 0x4989ADD: strv_consume (strv.c:473) ==11144== by 0x4989BD4: strv_extend (strv.c:512) ==11144== by 0x45A71A: config_parse_ip_filter_bpf_progs (load-fragment.c:4501) ==11144== by 0x41A565: main (test-bpf.c:206) ==11144== ==11144== LEAK SUMMARY: ==11144== definitely lost: 16 bytes in 1 blocks ==11144== indirectly lost: 25 bytes in 1 blocks ==11144== possibly lost: 0 bytes in 0 blocks ==11144== still reachable: 16,384 bytes in 3 blocks ==11144== suppressed: 0 bytes in 0 blocks ==11144== Reachable blocks (those to which a pointer was found) are not shown. ```
9883630 to
a302ddb
Compare
|
Have addressed the issues (also ASAN), except moving the check from |
|
@pothos as it turns out, I was too optimistic about how easy it is to run test-bpf on Travis CI under ASan/UBsan :-) I hope @mrc0mmand will help to bring them to Arch (run on CentOS CI with vagrant) so that it could be covered. In the meantime, I ran the test under Valgrind manually and the memory leak seems to be gone. By the way, there was an attempt to start using libbpf in #12151. I'm wondering if it would be possible to somehow sync these two PRs. cc @wat-ze-hex @rdna. |
|
@evverx this PR could save a few lines of code if libbpf was in systemd, specifically it has both Admittedly not too many though so IMO it should not be blocked by #12151 and it'll be easier to follow up later to switch to those libbpf functions if libbpf support is merged. |
poettering
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good, we are coming close!
a302ddb to
00745c3
Compare
|
Hi, |
poettering
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks excellent! just trivialities. one more round and we should be done!
00745c3 to
7078fe2
Compare
|
@poettering Have addressed it, thanks for bundling the review with actual code suggestions 👍 |
poettering
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, just one buglet
7078fe2 to
735220d
Compare
|
Thanks, done (: |
735220d to
57d5e18
Compare
Takes a single /sys/fs/bpf/pinned_prog string as argument, but may be specified multiple times. An empty assignment resets all previous filters. Closes systemd#10227
57d5e18 to
8550c24
Compare
|
Excellent! Thanks! |
|
Actually forgot to add the |
Takes a single
/sys/fs/bpf/pinned_progstring as argument but may bespecified multiple times. An empty assignment resets all previous filters.
Closes #10227
I propose to use filter paths and not a single boolean with a predefined path to lookup the pinned BPF program. With the chosen approach of this PR multiple filters can be combined, e.g., to attach a filter for HTTP contents and a separate filter for certain IP addresses. It also allows to reuse filters between units where, e.g., multiple units could depend on a oneshot unit that loads the filter at a certain place.
I decided for absolute paths since it makes reading and using these values elsewhere easy. It is more clear for users to see what should be pinned where in order to work (and other directives do also not strip away
/dev/in/dev/sda1).Does this make sense? I can change it to use relative paths etc. Did I forget something such as cleanup macros?
Currently, as with
IPAddressDenyit is not a failure if the BPF program cannot be attached orbpf_program_newfails. The error is logged but the unit will still start. Should this be more strict since theIPAddressDenyoptions are defaults somewhere but this one here not?What about weight 500 in the security analysis, is it too high? It's not recommended to have a custom filter but having it present may indicate a stronger firewalling. If egress, ingress or both are firewalled depends on the situation, so there is varying score, just on or off.