Conversation
|
branch: bpf_test |
48be228 to
e3eecb1
Compare
|
branch: bpf_test |
e3eecb1 to
3eff9da
Compare
|
branch: bpf_test |
3eff9da to
e0ec73a
Compare
|
branch: bpf_test |
e0ec73a to
8b7624e
Compare
|
branch: bpf_test |
8b7624e to
643843b
Compare
When bringing down the netdevice or system shutdown, a panic can be
triggered while accessing the sysfs path because the device is already
removed.
[ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called
[ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called
...
[ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280
crash> bt
...
PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd"
...
#9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
[exception RIP: dma_pool_alloc+0x1ab]
RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046
RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000
RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090
RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00
R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0
R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
#11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
#12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92
crash> net_device.state ffff89443b0c0000
state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER)
To prevent this scenario, we also make sure that the netdevice is present.
Signed-off-by: suresh kumar <suresh2514@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
branch: bpf_test |
643843b to
dbc3cf7
Compare
|
branch: bpf_test |
dbc3cf7 to
9cf59e7
Compare
|
branch: bpf_test |
9cf59e7 to
a0ccb4c
Compare
|
branch: bpf_test |
a0ccb4c to
1abe8e8
Compare
|
branch: bpf_test |
1abe8e8 to
7ced3af
Compare
Ido Schimmel says:
====================
HW counters for soft devices
Petr says:
Offloading switch device drivers may be able to collect statistics of the
traffic taking place in the HW datapath that pertains to a certain soft
netdevice, such as a VLAN. In this patch set, add the necessary
infrastructure to allow exposing these statistics to the offloaded
netdevice in question, and add mlxsw offload.
Across HW platforms, the counter itself very likely constitutes a limited
resource, and the act of counting may have a performance impact. Therefore
this patch set makes the HW statistics collection opt-in and togglable from
userspace on a per-netdevice basis.
Additionally, HW devices may have various limiting conditions under which
they can realize the counter. Therefore it is also possible to query
whether the requested counter is realized by any driver. In TC parlance,
which is to a degree reused in this patch set, two values are recognized:
"request" tracks whether the user enabled collecting HW statistics, and
"used" tracks whether any HW statistics are actually collected.
In the past, this author has expressed the opinion that `a typical user
doing "ip -s l sh", including various scripts, wants to see the full
picture and not worry what's going on where'. While that would be nice,
unfortunately it cannot work:
- Packets that trap from the HW datapath to the SW datapath would be
double counted.
For a given netdevice, some traffic can be purely a SW artifact, and some
may flow through the HW object corresponding to the netdevice. But some
traffic can also get trapped to the SW datapath after bumping the HW
counter. It is not clear how to make sure double-counting does not occur
in the SW datapath in that case, while still making sure that possibly
divergent SW forwarding path gets bumped as appropriate.
So simply adding HW and SW stats may work roughly, most of the time, but
there are scenarios where the result is nonsensical.
- HW devices will have limitations as to what type of traffic they can
count.
In case of mlxsw, which is part of this patch set, there is no reasonable
way to count all traffic going through a certain netdevice, such as a
VLAN netdevice enslaved to a bridge. It is however very simple to count
traffic flowing through an L3 object, such as a VLAN netdevice with an IP
address.
Similarly for physical netdevices, the L3 object at which the counter is
installed is the subport carrying untagged traffic.
These are not "just counters". It is important that the user understands
what is being counted. It would be incorrect to conflate these statistics
with another existing statistics suite.
To that end, this patch set introduces a statistics suite called "L3
stats". This label should make it easy to understand what is being counted,
and to decide whether a given device can or cannot implement this suite for
some type of netdevice. At the same time, the code is written to make
future extensions easy, should a device pop up that can implement a
different flavor of statistics suite (say L2, or an address-family-specific
suite).
For example, using a work-in-progress iproute2[1], to turn on and then list
the counters on a VLAN netdevice:
# ip stats set dev swp1.200 l3_stats on
# ip stats show dev swp1.200 group offload subgroup l3_stats
56: swp1.200: group offload subgroup l3_stats on used on
RX: bytes packets errors dropped missed mcast
0 0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
0 0 0 0 0 0
The patchset progresses as follows:
- Patch #1 is a cleanup.
- In patch #2, remove the assumption that all LINK_OFFLOAD_XSTATS are
dev-backed.
The only attribute defined under the nest is currently
IFLA_OFFLOAD_XSTATS_CPU_HIT. L3_STATS differs from CPU_HIT in that the
driver that supplies the statistics is not the same as the driver that
implements the netdevice. Make the code compatible with this in patch #2.
- In patch #3, add the possibility to filter inside nests.
The filter_mask field of RTM_GETSTATS header determines which
top-level attributes should be included in the netlink response. This
saves processing time by only including the bits that the user cares
about instead of always dumping everything. This is doubly important
for HW-backed statistics that would typically require a trip to the
device to fetch the stats. In this patch, the UAPI is extended to
allow filtering inside IFLA_STATS_LINK_OFFLOAD_XSTATS in particular,
but the scheme is easily extensible to other nests as well.
- In patch #4, propagate extack where we need it.
In patch #5, make it possible to propagate errors from drivers to the
user.
- In patch #6, add the in-kernel APIs for keeping track of the new stats
suite, and the notifiers that the core uses to communicate with the
drivers.
- In patch #7, add UAPI for obtaining the new stats suite.
- In patch #8, add a new UAPI message, RTM_SETSTATS, which will carry
the message to toggle the newly-added stats suite.
In patch #9, add the toggle itself.
At this point the core is ready for drivers to add support for the new
stats suite.
- In patches #10, #11 and #12, apply small tweaks to mlxsw code.
- In patch #13, add support for L3 stats, which are realized as RIF
counters.
- Finally in patch #14, a selftest is added to the net/forwarding
directory. Technically this is a HW-specific test, in that without a HW
implementing the counters, it just will not pass. But devices that
support L3 statistics at all are likely to be able to reuse this
selftest, so it seems appropriate to put it in the general forwarding
directory.
We also have a netdevsim implementation, and a corresponding selftest that
verifies specifically some of the core code. We intend to contribute these
later. Interested parties can take a look at the raw code at [2].
[1] https://github.com/pmachata/iproute2/commits/soft_counters
[2] https://github.com/pmachata/linux_mlxsw/commits/petrm_soft_counters_2
v2:
- Patch #3:
- Do not declare strict_start_type at the new policies, since they are
used with nla_parse_nested() (sans _deprecated).
- Use NLA_POLICY_NESTED to declare what the nest contents should be
- Use NLA_POLICY_MASK instead of BITFIELD32 for the filtering
attribute.
- Patch #6:
- s/monotonous/monotonic/ in commit message
- Use a newly-added struct rtnl_hw_stats64 for stats transfer
- Patch #7:
- Use a newly-added struct rtnl_hw_stats64 for stats transfer
- Patch #8:
- Do not declare strict_start_type at the new policies, since they are
used with nla_parse_nested() (sans _deprecated).
- Patch #13:
- Use a newly-added struct rtnl_hw_stats64 for stats transfer
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
branch: bpf_test |
7ced3af to
bfe52d8
Compare
|
branch: bpf_test |
bfe52d8 to
0553f92
Compare
|
branch: bpf_test |
0553f92 to
8b5cd84
Compare
|
branch: bpf_test |
8b5cd84 to
d626b26
Compare
|
branch: bpf_test |
3 similar comments
|
branch: bpf_test |
|
branch: bpf_test |
|
branch: bpf_test |
|
branch: bpf_test |
3 similar comments
|
branch: bpf_test |
|
branch: bpf_test |
|
branch: bpf_test |
|
branch: bpf_test |
3 similar comments
|
branch: bpf_test |
|
branch: bpf_test |
|
branch: bpf_test |
|
branch: bpf_test |
|
branch: bpf_test |
|
branch: bpf_test |
|
branch: bpf_test |
|
branch: bpf_test |
3 similar comments
|
branch: bpf_test |
|
branch: bpf_test |
|
branch: bpf_test |
|
branch: bpf_test |
|
branch: bpf_test |
|
branch: bpf_test |
1 similar comment
|
branch: bpf_test |
|
branch: bpf_test |
2 similar comments
|
branch: bpf_test |
|
branch: bpf_test |
|
branch: bpf_test |
branch: bpf_test
base: bpf
version: 3df9d80