endpoint: Write headerfile under full regeneration#12524
endpoint: Write headerfile under full regeneration#12524qmonnet merged 1 commit intocilium:masterfrom
Conversation
|
I ran into this bug while debugging #12482 and looked into it. I'm not 100% sure on the implementation of this, but it does seem reasonable to me. I'd like to get feedback from those who have more context on the regeneration of endpoints. @joestringer The code touches recently changed parts by you. It'd be great to have your feedback. |
|
test-me-please |
joestringer
left a comment
There was a problem hiding this comment.
Minor nit, I think we can have this be a little more concise but the change LGTM.
There was a problem hiding this comment.
We could probably tweak this to avoid duplicating the writeHeaderfile() call by instead having the headerfileChanged check above only set the regenerationLevel then have this check do the following instead:
| } else if datapathRegenCtxt.regenerationLevel == regeneration.RegenerateWithDatapathRebuild { | |
| if datapathRegenCtxt.regenerationLevel >= regeneration.RegenerateWithDatapathRewrite { |
Then I'm not sure whether the comment below provides any additional context; basically if either a rewrite or rebuild is required, we'll be sure to write the headerfile again.
|
May be worth considering whether we should backport this. In an ideal circumstance we should never need this, but even for development purposes for backports and so on it may turn out to be useful. I note that the referred PR that this marks as "Fixes" was also attempting to fix a broader version of this bug and was backported to v1.6; hopefully this will bring the saga to a close :-) |
I was thinking that as well. It's not a crucial bug, but the backports should be trivial as the change is small and code paths are consistent in all relevant trees. Marking for 1.{6,7,8}. |
Previously, regenerating an endpoint manually via CLI resulted in
compilation errors as seen below. These errors occurred because the
endpoint headerfile (ep_config.h or lxc_config.h) was not written to the
`<endpoint ID>_next` directory (will be reffered to as "next"), where
the BPF compilation takes place.
The reason the headerfile was not written to the "next" directory was
because Cilium only wrote the headerfile if it was changed (via hash).
However, a manual regeneration triggered through the API, sets the
regeneration level to "compile+load" (a full regeneration), where Cilium
expects all relevant files, including the headerfile, to be present in
the "next" directory (`<endpoint ID>_next`). Hence, the compilation
errors.
This commit fixes this issue by checking whether there has been a
request for a full regeneration, in which case, we write the headerfile
to the "next" directory.
```
root@k8s2:/var/run/cilium/state# clang -emit-llvm -O2 -target bpf -std=gnu89 -nostdinc -D__NR_CPUS__=2 -Wall -Wextra -Werror -Wshadow -Wno-address-of-packed-member -Wno-unknown-warning-option -Wno-gnu-variable-s
ized-type-not-at-end -Wdeclaration-after-statement -I/var/run/cilium/state/globals -I1285_next -I/var/lib/cilium/bpf -I/var/lib/cilium/bpf/include -c /var/lib/cilium/bpf/bpf_lxc.c -o /tmp/c
In file included from /var/lib/cilium/bpf/bpf_lxc.c:22:
/var/lib/cilium/bpf/lib/icmp6.h:50:29: error: use of undeclared identifier 'NODE_MAC'
union macaddr smac, dmac = NODE_MAC;
^
/var/lib/cilium/bpf/lib/icmp6.h:359:30: error: use of undeclared identifier 'NODE_MAC'
union macaddr router_mac = NODE_MAC;
^
In file included from /var/lib/cilium/bpf/bpf_lxc.c:26:
/var/lib/cilium/bpf/lib/lxc.h:67:29: error: use of undeclared identifier 'NODE_MAC'
union macaddr router_mac = NODE_MAC;
^
/var/lib/cilium/bpf/bpf_lxc.c:159:10: error: implicit declaration of function 'lookup_ip6_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
info = lookup_ip6_remote_endpoint(&orig_dip);
^
/var/lib/cilium/bpf/bpf_lxc.c:159:10: note: did you mean 'lookup_ip6_endpoint'?
/var/lib/cilium/bpf/lib/eps.h:13:1: note: 'lookup_ip6_endpoint' declared here
lookup_ip6_endpoint(struct ipv6hdr *ip6)
^
/var/lib/cilium/bpf/bpf_lxc.c:159:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
info = lookup_ip6_remote_endpoint(&orig_dip);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/cilium/bpf/bpf_lxc.c:529:10: error: implicit declaration of function 'lookup_ip4_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
info = lookup_ip4_remote_endpoint(orig_dip);
^
/var/lib/cilium/bpf/bpf_lxc.c:529:10: note: did you mean 'lookup_ip4_endpoint'?
/var/lib/cilium/bpf/lib/eps.h:35:1: note: 'lookup_ip4_endpoint' declared here
lookup_ip4_endpoint(const struct iphdr *ip4)
^
/var/lib/cilium/bpf/bpf_lxc.c:529:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
info = lookup_ip4_remote_endpoint(orig_dip);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/cilium/bpf/bpf_lxc.c:1027:10: error: implicit declaration of function 'lookup_ip6_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
info = lookup_ip6_remote_endpoint(src);
^
/var/lib/cilium/bpf/bpf_lxc.c:1027:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
info = lookup_ip6_remote_endpoint(src);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/cilium/bpf/bpf_lxc.c:1256:10: error: implicit declaration of function 'lookup_ip4_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
info = lookup_ip4_remote_endpoint(ip4->saddr);
^
/var/lib/cilium/bpf/bpf_lxc.c:1256:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
info = lookup_ip4_remote_endpoint(ip4->saddr);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11 errors generated.
```
Fixes cilium#10630
Fixes cilium#12005
Signed-off-by: Chris Tarazi <chris@isovalent.com>
d5d54e3 to
d138143
Compare
|
@christarazi Did you check if this fix also works for the host endpoint? I can have a look as a follow up if not. |
|
test-me-please |
Previously, regenerating an endpoint manually via CLI resulted in
compilation errors as seen below. These errors occurred because the
endpoint headerfile (ep_config.h or lxc_config.h) was not written to the
<endpoint ID>_nextdirectory (will be reffered to as "next"), wherethe BPF compilation takes place.
The reason the headerfile was not written to the "next" directory was
because Cilium only wrote the headerfile if it was changed (via hash).
However, a manual regeneration triggered through the API, sets the
regeneration level to "compile+load" (a full regeneration), where Cilium
expects all relevant files, including the headerfile, to be present in
the "next" directory (
<endpoint ID>_next). Hence, the compilationerrors.
This commit fixes this issue by checking whether there has been a
request for a full regeneration, in which case, we write the headerfile
to the "next" directory.
Fixes #10630
Fixes #12005