Skip to content

[swss]: Listen for undeliverable IPinIP packets#9348

Merged
theasianpianist merged 10 commits intosonic-net:masterfrom
theasianpianist:ipinip_ping
Dec 14, 2021
Merged

[swss]: Listen for undeliverable IPinIP packets#9348
theasianpianist merged 10 commits intosonic-net:masterfrom
theasianpianist:ipinip_ping

Conversation

@theasianpianist
Copy link
Copy Markdown
Contributor

@theasianpianist theasianpianist commented Nov 23, 2021

Why I did it

In dual ToR setups, it is possible for the standby to encap a server-bound packet to the active ToR when the active ToR does not have neighbor information for the packet's destination IP. In this case, the packet will be trapped to the CPU and dropped with no further action taken. This can result in blackholed traffic.

How I did it

Create a script in the orchagent docker container which listens for these trapped encapsulated packets. When such a packet is received, the script will issue a ping command to the packet's inner destination IP to start the neighbor learning process.

This script is also resilient to portchannel status changes (i.e. interface going up or down). An interface going down does not affect traffic sniffing on interfaces which are still up. When an interface comes back up, we restart the sniffer to start capturing traffic on that interface again.

How to verify it

Send an IPinIP packet to the active ToR:

<Ether  dst=94:8e:d3:04:eb:28 type=0x800 |<IP  proto=ipv6 src=10.1.0.33 dst=10.1.0.32 |<IPv6  src=2603:10b0:10b:8a8e::a56:e825 dst=fc02:1000::99 |>>>

This packet must meet the following requirements:

  1. Destination MAC matches the ToR's device MAC (which may be different from the VLAN MAC)
  2. The outer source IP should be the PEER_SWITCH IP defined in the PEER_SWITCH table in config DB
  3. The outer destination IP should be the ToR's Loopback0 IP
  4. The inner destination IP should be contained with the ToR's VLAN subnet

Verify that the inner destination IP is added to the kernel neighbor table:

admin@sonic:/var/log/swss$ ip -6 neigh
...
fc02:1000::99 dev Vlan1000  FAILED

Verify via packet capture that neighbor solicitation messages are sent by the kernel on the VLAN interface:

admin@sonic:~$ sudo tcpdump -i Vlan1000 -nnl
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on Vlan1000, link-type EN10MB (Ethernet), capture size 262144 bytes
01:40:09.581164 IP6 fc02:1000::1 > ff02::1:ff00:99: ICMP6, neighbor solicitation, who has fc02:1000::99, length 32
01:40:10.604827 IP6 fc02:1000::1 > ff02::1:ff00:99: ICMP6, neighbor solicitation, who has fc02:1000::99, length 32
01:40:11.628845 IP6 fc02:1000::1 > ff02::1:ff00:99: ICMP6, neighbor solicitation, who has fc02:1000::99, length 32

Verify that the ping command was recorded in the syslog:

Nov 23 01:40:09.576360 sonic INFO swss#ipinip_ping.py: Running command 'ping -c1 -W1 -6 fc02:1000::99'

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
logger.log_notice('Starting IPinIP listener for IPs {} and {}'
.format(self_ip, peer_ip))
while True:
sniff(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you check the cpu utilization of this? especially, if have a lots of packet trapped to Lpbk0 IP?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maximum rate that scapy can send packets from the PTF is about 170 KB/s. Sending IPinIP packets at this rate causes utilization to hover around ~2%, never exceeding ~2.6%

admin@str2-7050cx3-acs-08:~$ ps ax | grep ipinip_ping | grep -v grep
  91073 pts/0    S      0:03 python3 /usr/bin/ipinip_ping.py
admin@str2-7050cx3-acs-08:~$ top | grep 91073
  91073 root      20   0   38704  28308   9376 S   2.3   0.2   0:03.65 python3
  91073 root      20   0   38960  28364   9376 S   2.0   0.2   0:03.71 python3
  91073 root      20   0   38960  28432   9376 S   1.7   0.2   0:03.76 python3
  91073 root      20   0   38960  28508   9376 S   2.6   0.2   0:03.84 python3
  91073 root      20   0   38960  28568   9376 S   1.7   0.2   0:03.89 python3
  91073 root      20   0   38960  28628   9376 S   2.0   0.2   0:03.95 python3
  91073 root      20   0   39216  28692   9376 S   2.0   0.2   0:04.01 python3
  91073 root      20   0   39216  28772   9376 S   2.0   0.2   0:04.07 python3
  91073 root      20   0   39216  28840   9376 S   2.0   0.2   0:04.13 python3
  91073 root      20   0   39348  28904   9376 S   2.3   0.2   0:04.20 python3

- Use STATE_DB INTERFACE_TABLE to check portchannel status
- Cache portchannel members from config DB
- Wrap `ping` command with `timeout` for graceful exit
- Verify that the tunnel type in config DB is IPinIP

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
"""
start = datetime.now()

while (datetime.now() - start).seconds < 60:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be timeout, correct?


self_ip, peer_ip = self.get_ipinip_tunnel_addrs()
if self_ip is None or peer_ip is None:
logger.log_error('Could not get IPinIP tunnel addresses from '
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this need not be error in single tor. Lets have it as notice

@@ -0,0 +1,233 @@
#! /usr/bin/env python3
"""
Adds neighbor to kernel for undeliverable IPinIP packets
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest renaming this to tunnel_packet_handler.py so we can extend this to Vxlan in future

- Use argument instead of hardcoded value for timeout
- Reduce some logging severity

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
prsunny
prsunny previously approved these changes Dec 2, 2021
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
@theasianpianist
Copy link
Copy Markdown
Contributor Author

/Azp run

@azure-pipelines
Copy link
Copy Markdown

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@theasianpianist
Copy link
Copy Markdown
Contributor Author

/Azp run all

@azure-pipelines
Copy link
Copy Markdown

No pipelines are associated with this pull request.

pc_index_map = self.get_portchannel_index_mapping()
for msg in messages:
if msg['index'] in pc_index_map:
if msg['state'] == 'up':
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets check for interface flap cases as a future enhancement.

@theasianpianist
Copy link
Copy Markdown
Contributor Author

/Azp run

@azure-pipelines
Copy link
Copy Markdown

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@theasianpianist
Copy link
Copy Markdown
Contributor Author

/Azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@theasianpianist theasianpianist merged commit 7bd0a2a into sonic-net:master Dec 14, 2021
lguohan pushed a commit that referenced this pull request Dec 16, 2021
- Create a script in the orchagent docker container which listens for these encapsulated packets which are trapped to CPU (indicating that they cannot be routed/no neighbor info exists for the inner packet). When such a packet is received, the script will issue a ping command to the packet's inner destination IP to start the neighbor learning process.
- This script is also resilient to portchannel status changes (i.e. interface going up or down). An interface going down does not affect traffic sniffing on interfaces which are still up. When an interface comes back up, we restart the sniffer to start capturing traffic on that interface again.
theasianpianist added a commit to theasianpianist/sonic-buildimage that referenced this pull request Jan 5, 2022
- Create a script in the orchagent docker container which listens for these encapsulated packets which are trapped to CPU (indicating that they cannot be routed/no neighbor info exists for the inner packet). When such a packet is received, the script will issue a ping command to the packet's inner destination IP to start the neighbor learning process.
- This script is also resilient to portchannel status changes (i.e. interface going up or down). An interface going down does not affect traffic sniffing on interfaces which are still up. When an interface comes back up, we restart the sniffer to start capturing traffic on that interface again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants