Skip to content

[mellanox] SONiC fails to start on t0 #4367

@nazariig

Description

@nazariig

Description

The latest 201911 branch is broken: SONiC fails to start on t0 topo.
The latest stable version is: SONiC-OS-HEAD.62-a5a11f6e
Looks like the culprit is: #3888

Steps to reproduce the issue:

  1. Install latest 201911 image
  2. Deploy t0
  3. Reload config

Describe the results you received:

root@sonic:/home/admin# show ip interfaces
Interface        Master    IPv4 address/mask    Admin/Oper    BGP Neighbor    Neighbor IP
---------------  --------  -------------------  ------------  --------------  -------------
Loopback0                  10.1.0.32/32         up/up         N/A             N/A
PortChannel0001            10.0.0.56/31         up/up         ARISTA01T1      10.0.0.57
PortChannel0002            10.0.0.58/31         up/up         ARISTA02T1      10.0.0.59
PortChannel0003            10.0.0.60/31         up/up         ARISTA03T1      10.0.0.61
PortChannel0004            10.0.0.62/31         up/up         ARISTA04T1      10.0.0.63
docker0                    240.127.1.1/24       up/down       N/A             N/A
eth0                       10.210.25.3/22       up/up         N/A             N/A
lo                         127.0.0.1/8          up/up         N/A             N/A

root@sonic:/home/admin# docker exec -ti swss bash
root@sonic:/# supervisorctl status
arp_update                       RUNNING   pid 149, uptime 0:00:46
buffermgrd                       RUNNING   pid 125, uptime 0:00:52
enable_counters                  RUNNING   pid 132, uptime 0:00:51
intfmgrd                         RUNNING   pid 101, uptime 0:00:54
nbrmgrd                          RUNNING   pid 135, uptime 0:00:50
neighsyncd                       RUNNING   pid 61, uptime 0:00:58
orchagent                        RUNNING   pid 37, uptime 0:01:01
portmgrd                         FATAL     Exited too quickly (process log may have details)
portsyncd                        RUNNING   pid 56, uptime 0:01:00
restore_neighbors                EXITED    Apr 03 07:01 PM
rsyslogd                         RUNNING   pid 32, uptime 0:01:03
start.sh                         EXITED    Apr 03 07:02 PM
supervisor-proc-exit-listener    RUNNING   pid 18, uptime 0:01:04
swssconfig                       EXITED    Apr 03 07:01 PM
vlanmgrd                         RUNNING   pid 86, uptime 0:00:55
vrfmgrd                          RUNNING   pid 73, uptime 0:00:57
vxlanmgrd                        RUNNING   pid 142, uptime 0:00:48
root@sonic:/# supervisorctl status portmgrd
portmgrd                         FATAL     Exited too quickly (process log may have details)

Describe the results you expected:

root@sonic:/home/admin# show ip interfaces
Interface        Master    IPv4 address/mask    Admin/Oper    BGP Neighbor    Neighbor IP
---------------  --------  -------------------  ------------  --------------  -------------
Loopback0                  10.1.0.32/32         up/up         N/A             N/A
PortChannel0001            10.0.0.56/31         up/up         ARISTA01T1      10.0.0.57
PortChannel0002            10.0.0.58/31         up/up         ARISTA02T1      10.0.0.59
PortChannel0003            10.0.0.60/31         up/up         ARISTA03T1      10.0.0.61
PortChannel0004            10.0.0.62/31         up/up         ARISTA04T1      10.0.0.63
Vlan1000                   192.168.0.1/21       up/up         N/A             N/A
docker0                    240.127.1.1/24       up/down       N/A             N/A
eth0                       10.210.25.3/22       up/up         N/A             N/A
lo                         127.0.0.1/8          up/up         N/A             N/A

root@sonic:/home/admin# docker exec -ti swss bash
root@sonic:/# supervisorctl status
arp_update                       RUNNING   pid 142, uptime 0:02:39
buffermgrd                       RUNNING   pid 126, uptime 0:02:45
enable_counters                  RUNNING   pid 129, uptime 0:02:44
intfmgrd                         RUNNING   pid 104, uptime 0:02:48
nbrmgrd                          RUNNING   pid 132, uptime 0:02:42
neighsyncd                       RUNNING   pid 61, uptime 0:02:52
orchagent                        RUNNING   pid 37, uptime 0:02:55
portmgrd                         RUNNING   pid 123, uptime 0:02:47
portsyncd                        RUNNING   pid 56, uptime 0:02:54
restore_neighbors                EXITED    Apr 03 06:54 PM
rsyslogd                         RUNNING   pid 32, uptime 0:02:57
start.sh                         EXITED    Apr 03 06:54 PM
supervisor-proc-exit-listener    RUNNING   pid 18, uptime 0:02:58
swssconfig                       EXITED    Apr 03 06:54 PM
vlanmgrd                         RUNNING   pid 86, uptime 0:02:49
vrfmgrd                          RUNNING   pid 72, uptime 0:02:50
vxlanmgrd                        RUNNING   pid 135, uptime 0:02:40
root@sonic:/# supervisorctl status portmgrd
portmgrd                         RUNNING   pid 123, uptime 0:03:39

Additional information you deem important (e.g. issue happens only occasionally):

Output of show version:

SONiC Software Version: SONiC.HEAD.63-aa30030f
Distribution: Debian 9.12
Kernel: 4.9.0-11-2-amd64
Build commit: aa30030f
Build date: Fri Apr  3 04:00:37 UTC 2020
Built by: johnar@jenkins-worker-8

Platform: x86_64-mlnx_msn3700c-r0
HwSKU: ACS-MSN3700C
ASIC: mellanox
Uptime: 18:42:01 up 8 min,  1 user,  load average: 3.40, 3.11, 1.68

Docker images:
REPOSITORY                    TAG                 IMAGE ID            SIZE
docker-syncd-mlnx             HEAD.63-aa30030f    1af78a807fb8        382MB
docker-syncd-mlnx             latest              1af78a807fb8        382MB
docker-router-advertiser      HEAD.63-aa30030f    dfd40ee96097        283MB
docker-router-advertiser      latest              dfd40ee96097        283MB
docker-sonic-mgmt-framework   HEAD.63-aa30030f    9b9e7bd7c9c2        420MB
docker-sonic-mgmt-framework   latest              9b9e7bd7c9c2        420MB
docker-platform-monitor       HEAD.63-aa30030f    eb66728b0b6d        628MB
docker-platform-monitor       latest              eb66728b0b6d        628MB
docker-fpm-frr                HEAD.63-aa30030f    56be72464acf        327MB
docker-fpm-frr                latest              56be72464acf        327MB
docker-sflow                  HEAD.63-aa30030f    3c9b99d175c1        307MB
docker-sflow                  latest              3c9b99d175c1        307MB
docker-lldp-sv2               HEAD.63-aa30030f    fd4fdd3e6f73        304MB
docker-lldp-sv2               latest              fd4fdd3e6f73        304MB
docker-dhcp-relay             HEAD.63-aa30030f    f6c7bc2d4a67        293MB
docker-dhcp-relay             latest              f6c7bc2d4a67        293MB
docker-database               HEAD.63-aa30030f    42aa405e949f        283MB
docker-database               latest              42aa405e949f        283MB
docker-teamd                  HEAD.63-aa30030f    0251b0682e04        307MB
docker-teamd                  latest              0251b0682e04        307MB
docker-snmp-sv2               HEAD.63-aa30030f    65741245707e        340MB
docker-snmp-sv2               latest              65741245707e        340MB
docker-orchagent              HEAD.63-aa30030f    440f378fcfe2        325MB
docker-orchagent              latest              440f378fcfe2        325MB
docker-nat                    HEAD.63-aa30030f    7d31d76aa298        309MB
docker-nat                    latest              7d31d76aa298        309MB
docker-sonic-telemetry        HEAD.63-aa30030f    78c16dcf3cc3        344MB
docker-sonic-telemetry        latest              78c16dcf3cc3        344MB

Attach debug file sudo generate_dump:

Apr  3 18:37:15.144592 sonic ERR swss#intfmgrd: :- exec: /sbin/ip address "add" "192.168.0.1/21" broadcast "192.168.7.255" dev "Vlan1000": Success
Apr  3 18:37:15.144643 sonic ERR swss#intfmgrd: :- setIntfIp: Command '/sbin/ip address "add" "192.168.0.1/21" broadcast "192.168.7.255" dev "Vlan1000"' failed with rc 256
Apr  3 18:37:16.412159 sonic ERR swss#portmgrd: :- exec: /sbin/ip link set dev "Ethernet0" mtu "9100": Success
Apr  3 18:37:16.412300 sonic ERR swss#portmgrd: :- main: Runtime error: /sbin/ip link set dev "Ethernet0" mtu "9100" :
Apr  3 18:37:16.703704 sonic INFO swss#supervisord: start.sh portmgrd: ERROR (spawn error)
Apr  3 18:37:17.995162 sonic ERR swss#portmgrd: :- exec: /sbin/ip link set dev "Ethernet0" mtu "9100": Success
Apr  3 18:37:17.995162 sonic ERR swss#portmgrd: :- main: Runtime error: /sbin/ip link set dev "Ethernet0" mtu "9100" :
Apr  3 18:37:20.732584 sonic ERR swss#portmgrd: :- exec: /sbin/ip link set dev "Ethernet0" mtu "9100": Success
Apr  3 18:37:20.732584 sonic ERR swss#portmgrd: :- main: Runtime error: /sbin/ip link set dev "Ethernet0" mtu "9100" :
Apr  3 18:37:24.589801 sonic ERR swss#portmgrd: :- exec: /sbin/ip link set dev "Ethernet0" mtu "9100": Success
Apr  3 18:37:24.589864 sonic ERR swss#portmgrd: :- main: Runtime error: /sbin/ip link set dev "Ethernet0" mtu "9100" :

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions