Skip to content

Support for IPv4 fragments#10264

Merged
borkmann merged 8 commits intomasterfrom
pr/qmonnet/ipv4_fragments
Apr 9, 2020
Merged

Support for IPv4 fragments#10264
borkmann merged 8 commits intomasterfrom
pr/qmonnet/ipv4_fragments

Conversation

@qmonnet
Copy link
Copy Markdown
Member

@qmonnet qmonnet commented Feb 20, 2020

This PR is work in progress and is submitted for reviewers to validate the approach. It contains the BPF-related changes. They were successfully compiled, but have not been otherwise tested yet. I am still working on the relevant parts in cilium-agent (creation of the BPF map) and on the tests. The two patches have a number of observations especially regarding the location of the changes in the code, please feel free to have a look and comment.

High-level description: To add support for IPv4 datagram fragments when doing L4 load balancer or L4/L3+L4 policy lookups, we need a way to retrieve the L4 ports for the datagram the fragment belongs to. We can avoid reassembling packets (and disassembling it again on egress) by adding another eBPF table to do the association between the datagram (“identification” field) and the L4 ports related to the datagram, i.e. that can be extracted from the first fragment. The table is used as follows:

  • When the first fragment reaches the pipeline, fragmentation is detected (“identification”, “more fragments” flag). The 5-tuple is extracted as usual, but an update is also performed in the fragmentation table (key: src IP, dst IP, L4 proto, datagram id; value: L4 ports).

  • When the following fragments reach the pipeline, fragmentation is detected too (“identification”, “fragment offset”). As L4 ports are not present, the regular key extraction is not performed. Instead, a lookup in the fragmentation table is realised for retrieving the L4 ports associated with the datagram. Once this information is known, the normal processing can resume (starting with the lookup in the conntrack table). Packets cross the pipeline without being reassembled/disassembled again.

IPv4 fragment supports would be enabled by default, with maybe a flag to disable it (#ifdef IPV4_FRAGMENTS). It would not be available on older kernels that do not support LRU maps (e.g. 4.9).

Fixes: #10076
(in part)

Edit 2020-02-21 Now with the code to add the map in cilium-agent. Compiles, but not tested yet.


This change is Reviewable

@qmonnet qmonnet added kind/enhancement This would improve or streamline existing functionality. wip area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. labels Feb 20, 2020
@maintainer-s-little-helper
Copy link
Copy Markdown

Release note label not set, please set the appropriate release note.

2 similar comments
@maintainer-s-little-helper
Copy link
Copy Markdown

Release note label not set, please set the appropriate release note.

@maintainer-s-little-helper
Copy link
Copy Markdown

Release note label not set, please set the appropriate release note.

@maintainer-s-little-helper
Copy link
Copy Markdown

Release note label not set, please set the appropriate release note.

@coveralls
Copy link
Copy Markdown

coveralls commented Feb 20, 2020

Coverage Status

Coverage decreased (-0.005%) to 46.915% when pulling 3683dd5 on pr/qmonnet/ipv4_fragments into d8fee7c on master.

@zhiyuan0x
Copy link
Copy Markdown
Contributor

zhiyuan0x commented Feb 21, 2020

Hi @qmonnet , I think this approach cannot be applied to case that the package is out of order. We cannot guarantee that fragmented packets with a L4 header will arrive first.

Comment thread bpf/lib/ipv4.h
@qmonnet
Copy link
Copy Markdown
Member Author

qmonnet commented Feb 21, 2020

@zhiyuan0x Correct. If the fragments are not received in the logical order, we drop them until we get the first (logical) one for the datagram. This probably means that the whole packet is lost, I think TCP or UDP have no way to ask for a fragment resend. But if fragments do arrive in the correct order, we can process them. So this is not a perfect solution, but it should be an improvement over the current situation, as L4-related lookups for fragments are not possible at all. What do you think?

@qmonnet qmonnet force-pushed the pr/qmonnet/ipv4_fragments branch from 3694d5f to 07bf0f2 Compare February 21, 2020 17:16
Comment thread pkg/maps/fragmap/fragmap.go Outdated
Comment thread pkg/maps/fragmap/fragmap.go Outdated
@qmonnet qmonnet force-pushed the pr/qmonnet/ipv4_fragments branch from 07bf0f2 to 3b4ce5e Compare February 21, 2020 17:18
Comment thread pkg/maps/fragmap/fragmap.go Outdated
Comment thread pkg/maps/fragmap/fragmap.go Outdated
@qmonnet qmonnet force-pushed the pr/qmonnet/ipv4_fragments branch from 3b4ce5e to 5f0c381 Compare February 21, 2020 17:19
Copy link
Copy Markdown
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments on the map creation, on the Go side.

Comment thread pkg/maps/fragmap/fragmap.go Outdated
Comment thread pkg/datapath/linux/config/config.go Outdated
@qmonnet qmonnet force-pushed the pr/qmonnet/ipv4_fragments branch 2 times, most recently from 7e53022 to 1305509 Compare February 26, 2020 15:47
@maintainer-s-little-helper
Copy link
Copy Markdown

Commit 13055090b35565ad328f20803750802a31fba134 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/contributing/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper Bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Feb 26, 2020
@qmonnet qmonnet force-pushed the pr/qmonnet/ipv4_fragments branch from 1305509 to cb64a71 Compare February 26, 2020 15:48
@qmonnet qmonnet requested a review from joestringer April 8, 2020 21:56
@qmonnet
Copy link
Copy Markdown
Member Author

qmonnet commented Apr 8, 2020

JSON output fix in #10904

Copy link
Copy Markdown
Member

@borkmann borkmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just small nits I can comment later on, great work!

@borkmann borkmann merged commit 7e5f30d into master Apr 9, 2020
@borkmann borkmann deleted the pr/qmonnet/ipv4_fragments branch April 9, 2020 19:36
Comment thread Documentation/architecture.rst
Comment thread Documentation/cmdref/cilium-agent.md
Comment thread bpf/bpf_lxc.c
Comment thread bpf/lib/ipv4.h
Comment thread bpf/lib/ipv4.h
Comment thread bpf/lib/ipv4.h
Comment thread bpf/lib/ipv4.h
Comment thread bpf/lib/ipv4.h
if (ret < 0)
return ret;

map_update_elem(&IPV4_FRAG_DATAGRAMS_MAP, frag_id, ports, BPF_ANY);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a metric to the Cilium metrics map to have some introspection that fragments were seen? Would be super useful to have this visible in Prometheus as a graph later to see when/if potential configuration issues appeared.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also a metric if the map update failed to be able to debug that case when it could not be handled.

@borkmann
Copy link
Copy Markdown
Member

borkmann commented Apr 9, 2020

Can we extend the test cases so that the request is done from the third host which is not Cilium managed? Otherwise the service requests should always go via socket load balancing but not tc ingress/xdp and then we cannot check whether they'd work with DSR/SNAT as well (I presume they do not right now). Question is also whether we would want outside world fragments coming in and be processed instead of dropped early, meaning, only Cilium managed nodes would be allowed to send fragmented traffic among themselves which might be okay actually. Either way, we should probably have a test for it, and if we drop have a metric there for visibility.

@qmonnet
Copy link
Copy Markdown
Member Author

qmonnet commented Apr 10, 2020

I will probably wait before adding a configurable option for fragment map size, unless you think this is something we need.

I'm posting a first follow-up PR to address the minor nits: #10927

I will look into metrics, extended tests, and DSR/SNAT in future follow-ups.

@borkmann
Copy link
Copy Markdown
Member

Sounds good, thanks!

qmonnet added a commit that referenced this pull request Apr 10, 2020
Support for IPv4 was recently introduced, but a few possible
improvements were raised after the merge (see #10264). Let's fix them.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
pchaigno added a commit that referenced this pull request Apr 10, 2020
A new IP frag. map was introduced, but the list of symbols to skip in the
ELF wasn't updated, resulting in the following warning:

    2020-04-10T15:08:21.073661843Z level=warning msg="Skipping symbol substitution" subsys=elf symbol=cilium_ipv4_frag_datagrams

Fixes: #10264
Signed-off-by: Paul Chaignon <paul@cilium.io>
pchaigno added a commit that referenced this pull request Apr 10, 2020
A new IP frag. map was introduced, but the list of symbols to skip in the
ELF wasn't updated, resulting in the following warning:

    2020-04-10T15:08:21.073661843Z level=warning msg="Skipping symbol substitution" subsys=elf symbol=cilium_ipv4_frag_datagrams

Fixes: #10264
Signed-off-by: Paul Chaignon <paul@cilium.io>
borkmann pushed a commit that referenced this pull request Apr 10, 2020
Support for IPv4 was recently introduced, but a few possible
improvements were raised after the merge (see #10264). Let's fix them.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
joestringer pushed a commit that referenced this pull request Apr 13, 2020
A new IP frag. map was introduced, but the list of symbols to skip in the
ELF wasn't updated, resulting in the following warning:

    2020-04-10T15:08:21.073661843Z level=warning msg="Skipping symbol substitution" subsys=elf symbol=cilium_ipv4_frag_datagrams

Fixes: #10264
Signed-off-by: Paul Chaignon <paul@cilium.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/enhancement This would improve or streamline existing functionality. release-note/major This PR introduces major new functionality to Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fragment tracking

10 participants