Skip to content

[teamd] Port-channel interface is down after system/config reload #4070

@volodymyrsamotiy

Description

@volodymyrsamotiy

Description
Port-channel goes down after few system/config reloads and one member port is operationally down in kernel although it is up in APP_DB and SDK/SAI.

  • It is not always reproducible but can be observed after all types of reboots/reloads: 1) config reload; 2) cold/fast/warm reboot; 3) minigraph load config.
  • Also it was observed on both typologies with port-channels - T1-LAG and T0.
  • Looks like somehow operational state of such LAG member port in kernel is out of sync with the all other components.
  • netdev event for updating operational state of the member port to UP is always received by kernel (also all other callbacks/events are called/received and operational state is correct in SDK/SAI/APP_DB).
  • It always happens only with port-channel and its member port, so it looks like teamd related issue (it is time sensitive since not always reproducible).
  • The easiest way to reproduce the issue is to deploy T1-LAG topology and perform 5-10 config reloads.

Steps to reproduce the issue:

  1. Deploy T1-LAG topology
  2. Execute config reload command few times (usually 5-10)
  3. Observe that one port-channel is down
  4. Check that one member port of port-channel is down in kernel
    cat /sys/class/net/Ethernet<num>/operstate

Describe the results you received:
Once issue is reproduced the following is observed:

  • Port-channel is permanently down
show in po
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available,
       S - selected, D - deselected, * - not synced
  No.  Team Dev         Protocol     Ports
-----  ---------------  -----------  ---------------------------
 0002  PortChannel0002  LACP(A)(Up)  Ethernet0(S) Ethernet4(S)
 0005  PortChannel0005  LACP(A)(Up)  Ethernet8(S) Ethernet12(S)
 0008  PortChannel0008  LACP(A)(Up)  Ethernet20(S) Ethernet16(S)
 0011  PortChannel0011  LACP(A)(Up)  Ethernet28(S) Ethernet24(S)
 0014  PortChannel0014  LACP(A)(Up)  Ethernet32(S) Ethernet36(S)
 0017  PortChannel0017  LACP(A)(Up)  Ethernet44(S) Ethernet40(S)
 0020  PortChannel0020  LACP(A)(Dw)  Ethernet48(S*)
 0023  PortChannel0023  LACP(A)(Up)  Ethernet60(S) Ethernet56(S)
  • Operational state of LAG member is down in kernel:
cat /sys/class/net/Ethernet52/operstate
down
  • All ports are operationally up in SONiC (also up in SDK/SAI)
show int sta
      Interface            Lanes    Speed    MTU    Alias             Vlan    Oper    Admin             Type    Asym PFC
---------------  ---------------  -------  -----  -------  ---------------  ------  -------  ---------------  ----------
      Ethernet0          0,1,2,3     100G   9100     etp1  PortChannel0002      up       up  QSFP28 or later         off
      Ethernet4          4,5,6,7     100G   9100     etp2  PortChannel0002      up       up   QSFP+ or later         off
      Ethernet8        8,9,10,11     100G   9100     etp3  PortChannel0005      up       up  QSFP28 or later         off
     Ethernet12      12,13,14,15     100G   9100     etp4  PortChannel0005      up       up  QSFP28 or later         off
     Ethernet16      16,17,18,19     100G   9100     etp5  PortChannel0008      up       up   QSFP+ or later         off
     Ethernet20      20,21,22,23     100G   9100     etp6  PortChannel0008      up       up   QSFP+ or later         off
     Ethernet24      24,25,26,27     100G   9100     etp7  PortChannel0011      up       up  QSFP28 or later         off
     Ethernet28      28,29,30,31     100G   9100     etp8  PortChannel0011      up       up   QSFP+ or later         off
     Ethernet32      32,33,34,35     100G   9100     etp9  PortChannel0014      up       up   QSFP+ or later         off
     Ethernet36      36,37,38,39     100G   9100    etp10  PortChannel0014      up       up   QSFP+ or later         off
     Ethernet40      40,41,42,43     100G   9100    etp11  PortChannel0017      up       up   QSFP+ or later         off
     Ethernet44      44,45,46,47     100G   9100    etp12  PortChannel0017      up       up   QSFP+ or later         off
     Ethernet48      48,49,50,51     100G   9100    etp13  PortChannel0020      up       up   QSFP+ or later         off
     Ethernet52      52,53,54,55     100G   9100    etp14  PortChannel0020      up       up   QSFP+ or later         off
     Ethernet56      56,57,58,59     100G   9100    etp15  PortChannel0023      up       up   QSFP+ or later         off
     Ethernet60      60,61,62,63     100G   9100    etp16  PortChannel0023      up       up   QSFP+ or later         off
     Ethernet64      64,65,66,67     100G   9100    etp17           routed      up       up   QSFP+ or later         off
     Ethernet68      68,69,70,71     100G   9100    etp18           routed      up       up   QSFP+ or later         off
     Ethernet72      72,73,74,75     100G   9100    etp19           routed      up       up   QSFP+ or later         off
     Ethernet76      76,77,78,79     100G   9100    etp20           routed      up       up   QSFP+ or later         off
     Ethernet80      80,81,82,83     100G   9100    etp21           routed      up       up   QSFP+ or later         off
     Ethernet84      84,85,86,87     100G   9100    etp22           routed      up       up   QSFP+ or later         off
     Ethernet88      88,89,90,91     100G   9100    etp23           routed      up       up   QSFP+ or later         off
     Ethernet92      92,93,94,95     100G   9100    etp24           routed      up       up   QSFP+ or later         off
     Ethernet96      96,97,98,99     100G   9100    etp25           routed      up       up   QSFP+ or later         off
    Ethernet100  100,101,102,103     100G   9100    etp26           routed      up       up   QSFP+ or later         off
    Ethernet104  104,105,106,107     100G   9100    etp27           routed      up       up   QSFP+ or later         off
    Ethernet108  108,109,110,111     100G   9100    etp28           routed      up       up   QSFP+ or later         off
    Ethernet112  112,113,114,115     100G   9100    etp29           routed      up       up   QSFP+ or later         off
    Ethernet116  116,117,118,119     100G   9100    etp30           routed      up       up   QSFP+ or later         off
    Ethernet120  120,121,122,123      50G   9100    etp31           routed      up       up  QSFP28 or later         off
    Ethernet124  124,125,126,127      50G   9100    etp32           routed      up       up  QSFP28 or later         off
PortChannel0002              N/A     200G   9100      N/A           routed      up       up              N/A         N/A
PortChannel0005              N/A     200G   9100      N/A           routed      up       up              N/A         N/A
PortChannel0008              N/A     200G   9100      N/A           routed      up       up              N/A         N/A
PortChannel0011              N/A     200G   9100      N/A           routed      up       up              N/A         N/A
PortChannel0014              N/A     200G   9100      N/A           routed      up       up              N/A         N/A
PortChannel0017              N/A     200G   9100      N/A           routed      up       up              N/A         N/A
PortChannel0020              N/A     200G   9100      N/A           routed    down       up              N/A         N/A
PortChannel0023              N/A     200G   9100      N/A           routed      up       up              N/A         N/A

Describe the results you expected:

  • Port-channel should be operationally up after system/config reload.
  • Member port of the port-channel should be operationally up in kernel after system/config reload (should be in sync with the rest of the system).

Additional information you deem important (e.g. issue happens only occasionally):

  • Issue is reproducible on both master and ```201911`` images.
  • For example, it was observed on this version - SONiC.HEAD.17-e884e583.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions