Skip to content
/ linux Public

Commit 353407d

Browse files
idoschkuba-moo
authored andcommitted
ethtool: Add ability to control transceiver modules' power mode
Add a pair of new ethtool messages, 'ETHTOOL_MSG_MODULE_SET' and 'ETHTOOL_MSG_MODULE_GET', that can be used to control transceiver modules parameters and retrieve their status. The first parameter to control is the power mode of the module. It is only relevant for paged memory modules, as flat memory modules always operate in low power mode. When a paged memory module is in low power mode, its power consumption is reduced to the minimum, the management interface towards the host is available and the data path is deactivated. User space can choose to put modules that are not currently in use in low power mode and transition them to high power mode before putting the associated ports administratively up. This is useful for user space that favors reduced power consumption and lower temperatures over reduced link up times. In QSFP-DD modules the transition from low power mode to high power mode can take a few seconds and this transition is only expected to get longer with future / more complex modules. User space can control the power mode of the module via the power mode policy attribute ('ETHTOOL_A_MODULE_POWER_MODE_POLICY'). Possible values: * high: Module is always in high power mode. * auto: Module is transitioned by the host to high power mode when the first port using it is put administratively up and to low power mode when the last port using it is put administratively down. The operational power mode of the module is available to user space via the 'ETHTOOL_A_MODULE_POWER_MODE' attribute. The attribute is not reported to user space when a module is not plugged-in. The user API is designed to be generic enough so that it could be used for modules with different memory maps (e.g., SFF-8636, CMIS). The only implementation of the device driver API in this series is for a MAC driver (mlxsw) where the module is controlled by the device's firmware, but it is designed to be generic enough so that it could also be used by implementations where the module is controlled by the CPU. CMIS testing ============ # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x03 (ModuleReady) LowPwrAllowRequestHW : Off LowPwrRequestSW : Off The module is not in low power mode, as it is not forced by hardware (LowPwrAllowRequestHW is off) or by software (LowPwrRequestSW is off). The power mode can be queried from the kernel. In case LowPwrAllowRequestHW was on, the kernel would need to take into account the state of the LowPwrRequestHW signal, which is not visible to user space. $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy high power-mode high Change the power mode policy to 'auto': # ethtool --set-module swp11 power-mode-policy auto Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x01 (ModuleLowPwr) LowPwrAllowRequestHW : Off LowPwrRequestSW : On Put the associated port administratively up which will instruct the host to transition the module to high power mode: # ip link set dev swp11 up Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode high Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x03 (ModuleReady) LowPwrAllowRequestHW : Off LowPwrRequestSW : Off Put the associated port administratively down which will instruct the host to transition the module to low power mode: # ip link set dev swp11 down Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x01 (ModuleLowPwr) LowPwrAllowRequestHW : Off LowPwrRequestSW : On SFF-8636 testing ================ # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled Power set : Off Power override : On ... Transmit avg optical power (Channel 1) : 0.7733 mW / -1.12 dBm Transmit avg optical power (Channel 2) : 0.7649 mW / -1.16 dBm Transmit avg optical power (Channel 3) : 0.7790 mW / -1.08 dBm Transmit avg optical power (Channel 4) : 0.7837 mW / -1.06 dBm Rcvr signal avg optical power(Channel 1) : 0.9302 mW / -0.31 dBm Rcvr signal avg optical power(Channel 2) : 0.9079 mW / -0.42 dBm Rcvr signal avg optical power(Channel 3) : 0.8993 mW / -0.46 dBm Rcvr signal avg optical power(Channel 4) : 0.8778 mW / -0.57 dBm The module is not in low power mode, as it is not forced by hardware (Power override is on) or by software (Power set is off). The power mode can be queried from the kernel. In case Power override was off, the kernel would need to take into account the state of the LPMode signal, which is not visible to user space. $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy high power-mode high Change the power mode policy to 'auto': # ethtool --set-module swp13 power-mode-policy auto Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled Power set : On Power override : On ... Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm Put the associated port administratively up which will instruct the host to transition the module to high power mode: # ip link set dev swp13 up Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode high Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled Power set : Off Power override : On ... Transmit avg optical power (Channel 1) : 0.7934 mW / -1.01 dBm Transmit avg optical power (Channel 2) : 0.7859 mW / -1.05 dBm Transmit avg optical power (Channel 3) : 0.7885 mW / -1.03 dBm Transmit avg optical power (Channel 4) : 0.7985 mW / -0.98 dBm Rcvr signal avg optical power(Channel 1) : 0.9325 mW / -0.30 dBm Rcvr signal avg optical power(Channel 2) : 0.9034 mW / -0.44 dBm Rcvr signal avg optical power(Channel 3) : 0.9086 mW / -0.42 dBm Rcvr signal avg optical power(Channel 4) : 0.8885 mW / -0.51 dBm Put the associated port administratively down which will instruct the host to transition the module to low power mode: # ip link set dev swp13 down Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled Power set : On Power override : On ... Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
1 parent 9cbfc51 commit 353407d

File tree

8 files changed

+335
-3
lines changed

8 files changed

+335
-3
lines changed

Documentation/networking/ethtool-netlink.rst

Lines changed: 69 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,11 @@ In the message structure descriptions below, if an attribute name is suffixed
4141
with "+", parent nest can contain multiple attributes of the same type. This
4242
implements an array of entries.
4343

44+
Attributes that need to be filled-in by device drivers and that are dumped to
45+
user space based on whether they are valid or not should not use zero as a
46+
valid value. This avoids the need to explicitly signal the validity of the
47+
attribute in the device driver API.
48+
4449

4550
Request header
4651
==============
@@ -179,7 +184,7 @@ according to message purpose:
179184

180185
Userspace to kernel:
181186

182-
===================================== ================================
187+
===================================== =================================
183188
``ETHTOOL_MSG_STRSET_GET`` get string set
184189
``ETHTOOL_MSG_LINKINFO_GET`` get link settings
185190
``ETHTOOL_MSG_LINKINFO_SET`` set link settings
@@ -213,7 +218,9 @@ Userspace to kernel:
213218
``ETHTOOL_MSG_MODULE_EEPROM_GET`` read SFP module EEPROM
214219
``ETHTOOL_MSG_STATS_GET`` get standard statistics
215220
``ETHTOOL_MSG_PHC_VCLOCKS_GET`` get PHC virtual clocks info
216-
===================================== ================================
221+
``ETHTOOL_MSG_MODULE_SET`` set transceiver module parameters
222+
``ETHTOOL_MSG_MODULE_GET`` get transceiver module parameters
223+
===================================== =================================
217224

218225
Kernel to userspace:
219226

@@ -252,6 +259,7 @@ Kernel to userspace:
252259
``ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY`` read SFP module EEPROM
253260
``ETHTOOL_MSG_STATS_GET_REPLY`` standard statistics
254261
``ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY`` PHC virtual clocks info
262+
``ETHTOOL_MSG_MODULE_GET_REPLY`` transceiver module parameters
255263
======================================== =================================
256264

257265
``GET`` requests are sent by userspace applications to retrieve device
@@ -1521,6 +1529,63 @@ Kernel response contents:
15211529
``ETHTOOL_A_PHC_VCLOCKS_INDEX`` s32 PHC index array
15221530
==================================== ====== ==========================
15231531

1532+
MODULE_GET
1533+
==========
1534+
1535+
Gets transceiver module parameters.
1536+
1537+
Request contents:
1538+
1539+
===================================== ====== ==========================
1540+
``ETHTOOL_A_MODULE_HEADER`` nested request header
1541+
===================================== ====== ==========================
1542+
1543+
Kernel response contents:
1544+
1545+
====================================== ====== ==========================
1546+
``ETHTOOL_A_MODULE_HEADER`` nested reply header
1547+
``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` u8 power mode policy
1548+
``ETHTOOL_A_MODULE_POWER_MODE`` u8 operational power mode
1549+
====================================== ====== ==========================
1550+
1551+
The optional ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` attribute encodes the
1552+
transceiver module power mode policy enforced by the host. The default policy
1553+
is driver-dependent, but "auto" is the recommended default and it should be
1554+
implemented by new drivers and drivers where conformance to a legacy behavior
1555+
is not critical.
1556+
1557+
The optional ``ETHTHOOL_A_MODULE_POWER_MODE`` attribute encodes the operational
1558+
power mode policy of the transceiver module. It is only reported when a module
1559+
is plugged-in. Possible values are:
1560+
1561+
.. kernel-doc:: include/uapi/linux/ethtool.h
1562+
:identifiers: ethtool_module_power_mode
1563+
1564+
MODULE_SET
1565+
==========
1566+
1567+
Sets transceiver module parameters.
1568+
1569+
Request contents:
1570+
1571+
====================================== ====== ==========================
1572+
``ETHTOOL_A_MODULE_HEADER`` nested request header
1573+
``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` u8 power mode policy
1574+
====================================== ====== ==========================
1575+
1576+
When set, the optional ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` attribute is used
1577+
to set the transceiver module power policy enforced by the host. Possible
1578+
values are:
1579+
1580+
.. kernel-doc:: include/uapi/linux/ethtool.h
1581+
:identifiers: ethtool_module_power_mode_policy
1582+
1583+
For SFF-8636 modules, low power mode is forced by the host according to table
1584+
6-10 in revision 2.10a of the specification.
1585+
1586+
For CMIS modules, low power mode is forced by the host according to table 6-12
1587+
in revision 5.0 of the specification.
1588+
15241589
Request translation
15251590
===================
15261591

@@ -1620,4 +1685,6 @@ are netlink only.
16201685
n/a ``ETHTOOL_MSG_CABLE_TEST_TDR_ACT``
16211686
n/a ``ETHTOOL_MSG_TUNNEL_INFO_GET``
16221687
n/a ``ETHTOOL_MSG_PHC_VCLOCKS_GET``
1688+
n/a ``ETHTOOL_MSG_MODULE_GET``
1689+
n/a ``ETHTOOL_MSG_MODULE_SET``
16231690
=================================== =====================================

include/linux/ethtool.h

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -415,6 +415,17 @@ struct ethtool_module_eeprom {
415415
u8 *data;
416416
};
417417

418+
/**
419+
* struct ethtool_module_power_mode_params - module power mode parameters
420+
* @policy: The power mode policy enforced by the host for the plug-in module.
421+
* @mode: The operational power mode of the plug-in module. Should be filled by
422+
* device drivers on get operations.
423+
*/
424+
struct ethtool_module_power_mode_params {
425+
enum ethtool_module_power_mode_policy policy;
426+
enum ethtool_module_power_mode mode;
427+
};
428+
418429
/**
419430
* struct ethtool_ops - optional netdev operations
420431
* @cap_link_lanes_supported: indicates if the driver supports lanes
@@ -580,6 +591,11 @@ struct ethtool_module_eeprom {
580591
* @get_eth_ctrl_stats: Query some of the IEEE 802.3 MAC Ctrl statistics.
581592
* @get_rmon_stats: Query some of the RMON (RFC 2819) statistics.
582593
* Set %ranges to a pointer to zero-terminated array of byte ranges.
594+
* @get_module_power_mode: Get the power mode policy for the plug-in module
595+
* used by the network device and its operational power mode, if
596+
* plugged-in.
597+
* @set_module_power_mode: Set the power mode policy for the plug-in module
598+
* used by the network device.
583599
*
584600
* All operations are optional (i.e. the function pointer may be set
585601
* to %NULL) and callers must take this into account. Callers must
@@ -705,6 +721,12 @@ struct ethtool_ops {
705721
void (*get_rmon_stats)(struct net_device *dev,
706722
struct ethtool_rmon_stats *rmon_stats,
707723
const struct ethtool_rmon_hist_range **ranges);
724+
int (*get_module_power_mode)(struct net_device *dev,
725+
struct ethtool_module_power_mode_params *params,
726+
struct netlink_ext_ack *extack);
727+
int (*set_module_power_mode)(struct net_device *dev,
728+
const struct ethtool_module_power_mode_params *params,
729+
struct netlink_ext_ack *extack);
708730
};
709731

710732
int ethtool_check_ops(const struct ethtool_ops *ops);

include/uapi/linux/ethtool.h

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -706,6 +706,29 @@ enum ethtool_stringset {
706706
ETH_SS_COUNT
707707
};
708708

709+
/**
710+
* enum ethtool_module_power_mode_policy - plug-in module power mode policy
711+
* @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode.
712+
* @ETHTOOL_MODULE_POWER_MODE_POLICY_AUTO: Module is transitioned by the host
713+
* to high power mode when the first port using it is put administratively
714+
* up and to low power mode when the last port using it is put
715+
* administratively down.
716+
*/
717+
enum ethtool_module_power_mode_policy {
718+
ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH = 1,
719+
ETHTOOL_MODULE_POWER_MODE_POLICY_AUTO,
720+
};
721+
722+
/**
723+
* enum ethtool_module_power_mode - plug-in module power mode
724+
* @ETHTOOL_MODULE_POWER_MODE_LOW: Module is in low power mode.
725+
* @ETHTOOL_MODULE_POWER_MODE_HIGH: Module is in high power mode.
726+
*/
727+
enum ethtool_module_power_mode {
728+
ETHTOOL_MODULE_POWER_MODE_LOW = 1,
729+
ETHTOOL_MODULE_POWER_MODE_HIGH,
730+
};
731+
709732
/**
710733
* struct ethtool_gstrings - string set for data tagging
711734
* @cmd: Command number = %ETHTOOL_GSTRINGS

include/uapi/linux/ethtool_netlink.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ enum {
4747
ETHTOOL_MSG_MODULE_EEPROM_GET,
4848
ETHTOOL_MSG_STATS_GET,
4949
ETHTOOL_MSG_PHC_VCLOCKS_GET,
50+
ETHTOOL_MSG_MODULE_GET,
51+
ETHTOOL_MSG_MODULE_SET,
5052

5153
/* add new constants above here */
5254
__ETHTOOL_MSG_USER_CNT,
@@ -90,6 +92,8 @@ enum {
9092
ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY,
9193
ETHTOOL_MSG_STATS_GET_REPLY,
9294
ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY,
95+
ETHTOOL_MSG_MODULE_GET_REPLY,
96+
ETHTOOL_MSG_MODULE_NTF,
9397

9498
/* add new constants above here */
9599
__ETHTOOL_MSG_KERNEL_CNT,
@@ -833,6 +837,19 @@ enum {
833837
ETHTOOL_A_STATS_RMON_MAX = (__ETHTOOL_A_STATS_RMON_CNT - 1)
834838
};
835839

840+
/* MODULE */
841+
842+
enum {
843+
ETHTOOL_A_MODULE_UNSPEC,
844+
ETHTOOL_A_MODULE_HEADER, /* nest - _A_HEADER_* */
845+
ETHTOOL_A_MODULE_POWER_MODE_POLICY, /* u8 */
846+
ETHTOOL_A_MODULE_POWER_MODE, /* u8 */
847+
848+
/* add new constants above here */
849+
__ETHTOOL_A_MODULE_CNT,
850+
ETHTOOL_A_MODULE_MAX = (__ETHTOOL_A_MODULE_CNT - 1)
851+
};
852+
836853
/* generic netlink info */
837854
#define ETHTOOL_GENL_NAME "ethtool"
838855
#define ETHTOOL_GENL_VERSION 1

net/ethtool/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@ obj-$(CONFIG_ETHTOOL_NETLINK) += ethtool_nl.o
77
ethtool_nl-y := netlink.o bitset.o strset.o linkinfo.o linkmodes.o \
88
linkstate.o debug.o wol.o features.o privflags.o rings.o \
99
channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
10-
tunnels.o fec.o eeprom.o stats.o phc_vclocks.o
10+
tunnels.o fec.o eeprom.o stats.o phc_vclocks.o module.o

0 commit comments

Comments
 (0)