Conversation
|
Release note label not set, please set the appropriate release note. |
2 similar comments
|
Release note label not set, please set the appropriate release note. |
|
Release note label not set, please set the appropriate release note. |
|
Release note label not set, please set the appropriate release note. |
|
Hi @qmonnet , I think this approach cannot be applied to case that the package is out of order. We cannot guarantee that fragmented packets with a L4 header will arrive first. |
|
@zhiyuan0x Correct. If the fragments are not received in the logical order, we drop them until we get the first (logical) one for the datagram. This probably means that the whole packet is lost, I think TCP or UDP have no way to ask for a fragment resend. But if fragments do arrive in the correct order, we can process them. So this is not a perfect solution, but it should be an improvement over the current situation, as L4-related lookups for fragments are not possible at all. What do you think? |
3694d5f to
07bf0f2
Compare
07bf0f2 to
3b4ce5e
Compare
3b4ce5e to
5f0c381
Compare
pchaigno
left a comment
There was a problem hiding this comment.
A couple comments on the map creation, on the Go side.
7e53022 to
1305509
Compare
|
Commit 13055090b35565ad328f20803750802a31fba134 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/contributing/#developer-s-certificate-of-origin |
1305509 to
cb64a71
Compare
|
JSON output fix in #10904 |
borkmann
left a comment
There was a problem hiding this comment.
Just small nits I can comment later on, great work!
| if (ret < 0) | ||
| return ret; | ||
|
|
||
| map_update_elem(&IPV4_FRAG_DATAGRAMS_MAP, frag_id, ports, BPF_ANY); |
There was a problem hiding this comment.
Can we add a metric to the Cilium metrics map to have some introspection that fragments were seen? Would be super useful to have this visible in Prometheus as a graph later to see when/if potential configuration issues appeared.
There was a problem hiding this comment.
Maybe also a metric if the map update failed to be able to debug that case when it could not be handled.
|
Can we extend the test cases so that the request is done from the third host which is not Cilium managed? Otherwise the service requests should always go via socket load balancing but not tc ingress/xdp and then we cannot check whether they'd work with DSR/SNAT as well (I presume they do not right now). Question is also whether we would want outside world fragments coming in and be processed instead of dropped early, meaning, only Cilium managed nodes would be allowed to send fragmented traffic among themselves which might be okay actually. Either way, we should probably have a test for it, and if we drop have a metric there for visibility. |
|
I will probably wait before adding a configurable option for fragment map size, unless you think this is something we need. I'm posting a first follow-up PR to address the minor nits: #10927 I will look into metrics, extended tests, and DSR/SNAT in future follow-ups. |
|
Sounds good, thanks! |
Support for IPv4 was recently introduced, but a few possible improvements were raised after the merge (see #10264). Let's fix them. Signed-off-by: Quentin Monnet <quentin@isovalent.com>
A new IP frag. map was introduced, but the list of symbols to skip in the
ELF wasn't updated, resulting in the following warning:
2020-04-10T15:08:21.073661843Z level=warning msg="Skipping symbol substitution" subsys=elf symbol=cilium_ipv4_frag_datagrams
Fixes: #10264
Signed-off-by: Paul Chaignon <paul@cilium.io>
A new IP frag. map was introduced, but the list of symbols to skip in the
ELF wasn't updated, resulting in the following warning:
2020-04-10T15:08:21.073661843Z level=warning msg="Skipping symbol substitution" subsys=elf symbol=cilium_ipv4_frag_datagrams
Fixes: #10264
Signed-off-by: Paul Chaignon <paul@cilium.io>
Support for IPv4 was recently introduced, but a few possible improvements were raised after the merge (see #10264). Let's fix them. Signed-off-by: Quentin Monnet <quentin@isovalent.com>
A new IP frag. map was introduced, but the list of symbols to skip in the
ELF wasn't updated, resulting in the following warning:
2020-04-10T15:08:21.073661843Z level=warning msg="Skipping symbol substitution" subsys=elf symbol=cilium_ipv4_frag_datagrams
Fixes: #10264
Signed-off-by: Paul Chaignon <paul@cilium.io>
This PR is work in progress and is submitted for reviewers to validate the approach. It contains the BPF-related changes. They were successfully compiled, but have not been otherwise tested yet. I am still working on the relevant parts in cilium-agent (creation of the BPF map) and on the tests. The two patches have a number of observations especially regarding the location of the changes in the code, please feel free to have a look and comment.
High-level description: To add support for IPv4 datagram fragments when doing L4 load balancer or L4/L3+L4 policy lookups, we need a way to retrieve the L4 ports for the datagram the fragment belongs to. We can avoid reassembling packets (and disassembling it again on egress) by adding another eBPF table to do the association between the datagram (“identification” field) and the L4 ports related to the datagram, i.e. that can be extracted from the first fragment. The table is used as follows:
When the first fragment reaches the pipeline, fragmentation is detected (“identification”, “more fragments” flag). The 5-tuple is extracted as usual, but an update is also performed in the fragmentation table (key: src IP, dst IP, L4 proto, datagram id; value: L4 ports).
When the following fragments reach the pipeline, fragmentation is detected too (“identification”, “fragment offset”). As L4 ports are not present, the regular key extraction is not performed. Instead, a lookup in the fragmentation table is realised for retrieving the L4 ports associated with the datagram. Once this information is known, the normal processing can resume (starting with the lookup in the conntrack table). Packets cross the pipeline without being reassembled/disassembled again.
IPv4 fragment supports would be enabled by default, with maybe a flag to disable it (
#ifdef IPV4_FRAGMENTS). It would not be available on older kernels that do not support LRU maps (e.g. 4.9).Fixes: #10076
(in part)
Edit 2020-02-21 Now with the code to add the map in cilium-agent. Compiles, but not tested yet.
This change is