Conversation
|
|
||
| DEFINE_STRING_TABLE_LOOKUP(nfproto, int); | ||
|
|
||
| #define NFT_SET_MSGS 3 |
There was a problem hiding this comment.
why a macro for this? why not just specify the size literally in the variable declaration, and then use ELEMENTSOF to reference the size when you need it?
There was a problem hiding this comment.
I have no preference, but similar #defines are used elsewhere in this file.
There was a problem hiding this comment.
I agree that there seems to be little reason to make a define for this, which survives beyond the end of the function, and only use it once in the function.
|
Hmm it seems that in more recent e.g.: |
|
I was wondering why the filter wouldn't work as expected even though the cgroup numbers are correctly inserted to the set. Looking at here, here and here, it seems that The cgroupv2 version, Maybe cgroupv2 could be made to work with bit more invasive approach: PID1 would install filters which match cgroups (once they are realized). The filter would then set socket marks. Firewall rules would then be able to use the socket marks as they please. |
The path is resolved when the firewall rules are loaded and if the cgroup path doesn't exist (since the service hasn't been started yet), it will fail. This makes them equally useless directly. Perhaps there's a way (PID1 would manage cgroupv2 rules which use socket marks and store the marks in sets so that rules can reference them). |
|
Yeh sounds to me like upstreaming something along the lines of https://github.com/mk-fg/systemd-cgroup-nftables-policy-manager seems to be the way to go. Also if it turns out that |
It works by loading arbitrary NFT filters when a service starts. I think this level of complexity was rejected earlier, leading to simpler Perhaps network namespace support could be beefed up so that interfaces would be available inside the namespace without
This is indeed the case. |
|
The |
|
Updated with |
78fb9d1 to
614ee4e
Compare
|
/cc @poettering Removed also GitHub review request. |
|
Split commits for easier review, updated from lessons learned of #23269 and rebased to it. |
|
mkosi / ci (arch, rolling) fails with Unit tests / build (CLANG, gcrypt): and CentOS CI (Arch Linux): I don't think these are related (nothing to do with networking). @mrc0mmand do they look familiar? |
|
Hah, you've hit the bingo of known issues:
That's #23485.
Spurious, but known timeout.
Aaand that's #23424.
I'll retrigger the jobs and will cross my fingers for better luck :-) |
|
@mrc0mmand All green! @yuwata @poettering please review. |
|
I converted my firewalls to use cgroups for examining packets: first by cgroup level 1 ( This kind of setup allows very fine firewall control over individual services since all NFT features can be used at that point. This can be contrasted with The feature doesn't work well for user services since the cgroups are realized by user manager, which doesn't have privileges to fill in the NFT sets. That doesn't matter much for server systems. I suppose a trusted SUID program like Firejail could manage the sets for desktop apps. Flushing the NFT rulesets also clears the NFT sets. Then PID1 only repopulates them when the cgroup is restarted and that isn't ever going to happen for |
48a13b4 to
1e330cf
Compare
New directives `NFTSet=`, `IPv4NFTSet=` and `IPv6NFTSet=` provide a method for
integrating configuration of dynamic networks into firewall rules with NFT
sets.
/etc/systemd/network/eth.network
```
[DHCPv4]
...
NFTSet=netdev:filter:eth_ipv4_address
```
```
table netdev filter {
set eth_ipv4_address {
type ipv4_addr
flags interval
}
chain eth_ingress {
type filter hook ingress device "eth0" priority filter; policy drop;
ip saddr != @eth_ipv4_address drop
accept
}
}
```
```
sudo nft list set netdev filter eth_ipv4_address
table netdev filter {
set eth_ipv4_address {
type ipv4_addr
flags interval
elements = { 10.0.0.0/24 }
}
}
```
New directive `ControlGroupNFTSet=` provides a method for integrating services
into firewall rules with NFT sets.
Example:
```
table inet filter {
...
set timesyncd {
type cgroupsv2
}
chain ntp_output {
socket cgroupv2 != @timesyncd counter drop
accept
}
...
}
```
/etc/systemd/system/systemd-timesyncd.service.d/override.conf
```
[Service]
ControlGroupNFTSet=inet:filter:timesyncd
```
```
$ sudo nft list set inet filter timesyncd
table inet filter {
set timesyncd {
type cgroupsv2
elements = { "system.slice/systemd-timesyncd.service" }
}
}
```
New directive `DynamicUserNFTSet=` provides a method for integrating
configuration of dynamic users into firewall rules with NFT sets.
Example:
```
table inet filter {
set u {
typeof meta skuid
}
chain service_output {
meta skuid != @U drop
accept
}
}
```
```
/etc/systemd/system/dunft.service
[Service]
DynamicUser=yes
DynamicUserNFTSet=inet:filter:u
ExecStart=/bin/sleep 1000
[Install]
WantedBy=multi-user.target
```
```
$ sudo nft list set inet filter u
table inet filter {
set u {
typeof meta skuid
elements = { 64864 }
}
}
$ ps -n --format user,group,pid,command -p `pgrep sleep`
USER GROUP PID COMMAND
64864 64864 55158 /bin/sleep 1000
```
|
@topimiettinen Did this get another look by @poettering or @yuwata? |
Not that I know of. |
|
Same here. This needs final review. Temporary setting 'merged-but-needs-fixing' label. |
| <para>These settings provide a method for integrating dynamic network configuration into firewall | ||
| rules with NFT sets. These options expect a whitespace separated list of NFT set definitions. Each |
There was a problem hiding this comment.
Please link to some explanation of NFT.
| if ((c >= 'a' && c <= 'z') || | ||
| (c >= 'A' && c <= 'Z') || | ||
| (c >= '0' && c <= '9') || | ||
| c == '/' || c == '\\' || c == '_' || c == '.') | ||
| return false; | ||
| return true; |
There was a problem hiding this comment.
Maybe call this next_char_good() (positive conditions are easier to grok, we generally prefer positive booleans and checks), and just write
return (c >= 'a' && c <= 'z') ||
(c >= 'A' && c <= 'Z') ||
...;| r = nft_set_element_op_in_addr(nfnl, table, set, | ||
| add, nfproto, af, address, prefixlen); | ||
|
|
||
| (void) in_addr_prefix_to_string(af, address, prefixlen, &addr_str); | ||
|
|
||
| log_debug("%s NFT family %s table %s set %s IP addresss %s", add? "Added" : "Deleted", | ||
| nfproto_to_string(nfproto), table, set, strna(addr_str)); |
There was a problem hiding this comment.
Hmm, if r < 0, this debug stmt is wrong.
| [NFPROTO_ARP] = "arp", | ||
| [NFPROTO_BRIDGE] = "bridge", | ||
| [NFPROTO_INET] = "inet", | ||
| [NFPROTO_IPV4] = "ip", | ||
| [NFPROTO_IPV6] = "ip6", | ||
| [NFPROTO_NETDEV] = "netdev", |
There was a problem hiding this comment.
Please align the strings to the same columns.
|
|
||
| DEFINE_STRING_TABLE_LOOKUP(nfproto, int); | ||
|
|
||
| #define NFT_SET_MSGS 3 |
There was a problem hiding this comment.
I agree that there seems to be little reason to make a define for this, which survives beyond the end of the function, and only use it once in the function.
| ++tsize; | ||
| assert(tsize < ELEMENTSOF(transaction)); | ||
| r = sd_nfnl_message_batch_end(nfnl, &transaction[tsize]); | ||
| if (r < 0) | ||
| return r; | ||
|
|
||
| ++tsize; |
There was a problem hiding this comment.
See similar uses in for example above line 1091, tsize is used to keep track of used space in the array and assert()s guard that the array size isn't exceeded at any point.
| } | ||
|
|
||
| if (nft_identifier_bad(table)) | ||
| return log_syntax(unit, LOG_WARNING, filename, line, 0, "Invalid table name %s, ignoring", table); |
There was a problem hiding this comment.
Here too, cunescape probably needs to be used.
| return log_syntax(unit, LOG_WARNING, filename, line, 0, "Invalid table name %s, ignoring", table); | ||
|
|
||
| if (nft_identifier_bad(set)) | ||
| return log_syntax(unit, LOG_WARNING, filename, line, 0, "Invalid set name %s, ignoring", set); |
Add links to documentation to NetLabel, LSM, Netlabel Fallback Peer Labeling and NFT. Use a positive boolean function nft_identifier_good() to help grok the checks more easily than nft_identifier_bad(). Escape logged invalid items. Add an ASSERT_PTR() for 'data' in config_parse_netlabel(). Use a different debug message when NFT operation fails. Align nfproto_table strings. Remove NFT_SET_MSGS, it's only used once.
|
Created #23712 for the fixes. |
This reverts PR systemd#22587 and its follow-up commit. More specifically, 2299b1c (partially), e176f85, ceb46a3, and 51bb907. The PR was merged without final approval, and has several issues: - OSS fuzz reported issues in the conf parser, - It calls synchrnous netlink call, it should not be especially in PID1, - The importance of NFTSet for CGroup and DynamicUser may be questionable, at least, there was no justification PID1 should support it. - For networkd, it should be implemented with Request object, - There is no test for the feature. Fixes systemd#23711. Fixes systemd#23717. Fixes systemd#23719. Fixes systemd#23720. Fixes systemd#23721. Fixes systemd#23759.
poettering
left a comment
There was a problem hiding this comment.
Hmpf, this doesn't look ready to me yet.
Is there a test for this?
|
|
||
| <varlistentry> | ||
| <term><varname>IPv4NFTSet=</varname><replaceable>family</replaceable>:<replaceable>table</replaceable>:<replaceable>set</replaceable></term> | ||
| <term><varname>IPv6NFTSet=</varname><replaceable>family</replaceable>:<replaceable>table</replaceable>:<replaceable>set</replaceable></term> |
There was a problem hiding this comment.
BTW, I think it would make sense to implicitly use the network file name as the set name, if not specified, to push people to a clear nomenclature where each .network file gets its own set. i.e. if NFTSet= is set to just "netdev:filter" in a file foobar.network then this would be equivalent to "netdev:filter:foobar"
There was a problem hiding this comment.
BTW, does it really make sense to have two distinct settings for IPv4 and IPv6 here? why is this in the lvalue instead of the rvalue? Can you explain?
Can't the two kind of addresses be added to the same set? (no nft pro here, as you might guess).
it appears icky to me to allow ipv4 and ipv configuration here to be entirely orthogonal.
| </listitem> | ||
| </varlistentry> | ||
| <varlistentry> | ||
| <term><varname>ControlGroupNFTSet=</varname><replaceable>family</replaceable>:<replaceable>table</replaceable>:<replaceable>set</replaceable></term> |
There was a problem hiding this comment.
As in the .network patch: I think the set part should be optional, and we should imply the unit name there, so that set names are by default derived from the unit names.
And I am not sure this should carry "Controlgroup" in the name really. that is an implementation detail I think → #24255
| nfproto_to_string(s->nfproto), | ||
| s->table, | ||
| s->set, | ||
| u->cgroup_id); |
There was a problem hiding this comment.
you must validate u->cgroup_id first here, it's not guaranteed to be initialized, as we require privs for that, we might not have
| nfproto_to_string(s->nfproto), | ||
| s->table, | ||
| s->set, | ||
| u->cgroup_id); |
| (c >= 'A' && c <= 'Z')) | ||
| return false; | ||
| return true; | ||
| } |
There was a problem hiding this comment.
this is weird. In all other cases where we validate strings we keep lists of what is good, not what is bad, and the functions hence are called xyz_valid() or xyz_is_valid() or valid_xyz() or so. This here is the only one that is inverted. Could you please swap this around to match the general style?
| CGroupContext *c; | ||
|
|
||
| assert(u); | ||
|
|
There was a problem hiding this comment.
this should not even be attempted if we aren't in cgroupv2 unified mode
| <variablelist class='unit-directives'> | ||
|
|
||
| <varlistentry> | ||
| <term><varname>DynamicUserNFTSet=</varname><replaceable>family</replaceable>:<replaceable>table</replaceable>:<replaceable>set</replaceable></term> |
There was a problem hiding this comment.
this sounds awfully specific to me. I mean, I can see the usefulness, but why limit this to dynamic users? Why not do that for any UID here? it's really unnecessarily restricting to just say dynamic users.
Given this leaks into the unit file syntax, this really should be fixed before 252
Also, as in the other case I am sure the third part of the expression should be derived from the unit name automatically, if not specified, to push people to a "standardized" name of sets. And the examples provided should always make use of these default namings, instead of making up their own
| if (l < 0) | ||
| return log_error_errno(l, "Failed to unescape %s= value: %s", field, eq); | ||
|
|
||
| r = sd_bus_message_append(m, "(iss)", family, word, eq); |
| return log_error_errno(r, "Failed to parse %s: %m", field); | ||
| if (isempty(word) || isempty(eq)) { | ||
| log_error("Failed to parse %s", field); | ||
| return 0; |
There was a problem hiding this comment.
return 0?
should be return log_error_errno(SYNTHETIC_ERRNO(EINVAL), …
| if (dcreds->user) | ||
| username = dcreds->user->name; | ||
|
|
||
| exec_add_dynamic_user_nft_set(context, uid); |
There was a problem hiding this comment.
please don't do this from the child, but PID 1, i.e. we send the resolved UID/GID to PID 1 further down, look for send_user_lookup(). Put the registration in the NFT tables at the receiving end of that.
New directives
ControlGroupNFTSet=,DynamicUserNFTSet=,IPv4NFTSet=andIPv6NFTSet=provide methods for integrating configuration of control groups,dynamic users and dynamic networks into firewall rules with NFT sets.
Examples:
ControlGroupNFTSet:
/etc/systemd/system/systemd-timesyncd.service.d/override.conf
DynamicUser:
/etc/systemd/system/dunft.service
Network:
/etc/systemd/network/lan.network
Closes: #22527.