Skip to content

[warm-reboot] Orchagent will crash during startup, when we execute warm-reboot after pfcwd detects storm #2888

@leoli-nps

Description

@leoli-nps

Description

  1. Top
    (SW1)Ethernet120 ---- Ixia
  1. Enable pfc on queue 0 of Ethernet120
    "PORT_QOS_MAP": {
        "Ethernet120": {
            "pfc_to_queue_map": "[MAP_PFC_PRIORITY_TO_QUEUE|AZURE]",
            "pfc_enable": "0"
        }
    }
  1. Configuring pfcwd on Ethernet120
    "PFC_WD_TABLE": {
        "Ethernet120": {
            "action": "drop",
            "detection_time": "500",
            "restoration_time": "5000"
        }
    }
  1. Sending pfc frames to Ethernet120 via Ixia, triggering pfc storm; then performing warm-reboot operation on SW1. After this, we find that orchagent crashed. And in the log we can see the following messages:
May 10 11:01:20.856475 sonic NOTICE swss#orchagent: :- syncd_apply_view: Notify syncd APPLY_VIEW
May 10 11:01:20.856490 sonic NOTICE swss#orchagent: :- sai_redis_notify_syncd: sending syncd APPLY view
May 10 11:01:20.857630 sonic NOTICE swss#orchagent: :- sai_redis_internal_notify_syncd: wait for notify response
May 10 11:01:20.859381 sonic WARNING syncd#syncd: :- notifySyncd: syncd received APPLY VIEW, will translate
May 10 11:01:20.884649 sonic NOTICE syncd#syncd: :- dump: getting took 0.018019 sec
May 10 11:01:20.885501 sonic ERR syncd#syncd: :- sai_deserialize_attr_id: invalid attr id: SAI_INGRESS_PRIORITY_GROUP_STAT_DROPPED_PACKETS
May 10 11:01:20.886200 sonic NOTICE syncd#syncd: :- redisGetAsicView: get asic view from ASIC_STATE took 0.020048 sec
May 10 11:01:20.886316 sonic ERR syncd#syncd: :- syncdApplyView: Exception: :- sai_deserialize_attr_id: invalid attr id: SAI_INGRESS_PRIORITY_GROUP_STAT_DROPPED_PACKETS
May 10 11:01:20.886843 sonic NOTICE syncd#syncd: :- syncdApplyView: apply took 0.027272 sec
May 10 11:01:20.886843 sonic NOTICE syncd#syncd: :- sendNotifyResponse: sending response: SAI_STATUS_FAILURE
May 10 11:01:20.887118 sonic NOTICE swss#orchagent: :- sai_redis_internal_notify_syncd: notify response: SAI_STATUS_FAILURE
May 10 11:01:20.887118 sonic ERR swss#orchagent: :- sai_redis_notify_syncd: notify syncd failed: SAI_STATUS_FAILURE
May 10 11:01:20.887129 sonic ERR swss#orchagent: :- syncd_apply_view: Failed to notify syncd APPLY_VIEW -1

And find the following information in ASIC_DB:

admin@sonic:~$ redis-cli -n 2 hget "COUNTERS_PG_NAME_MAP" "Ethernet120:0"
"oid:0x1a000000000552"
admin@sonic:~$ redis-cli -n 1 hgetall "ASIC_STATE:SAI_OBJECT_TYPE_INGRESS_PRIORITY_GROUP:oid:0x1a000000000552"
1) "NULL"
2) "NULL"
3) "SAI_INGRESS_PRIORITY_GROUP_ATTR_BUFFER_PROFILE"
4) "oid:0x19000000000665"
5) "SAI_INGRESS_PRIORITY_GROUP_STAT_PACKETS"
6) "0"
7) "SAI_INGRESS_PRIORITY_GROUP_STAT_DROPPED_PACKETS"
8) "0"
admin@sonic:~$

Steps to reproduce the issue:

  1. As described in the Description

Describe the results you received:
As described in the Description

Describe the results you expected:
Warm-reboot can start normally

Additional information you deem important (e.g. issue happens only occasionally):

**Output of `show version`:**
admin@sonic:~$ show version
SONiC Software Version: SONiC.origin_201811.0-dirty-20190418.223441
Distribution: Debian 9.8
Kernel: 4.9.0-8-amd64
Build commit: 051bb23
Build date: Fri Apr 19 06:33:08 UTC 2019
Built by: simon@nps65

Docker images:
REPOSITORY                 TAG                                     IMAGE ID            SIZE
docker-syncd-nephos        latest                                  1c3500846360        326MB
docker-syncd-nephos        origin_201811.0-dirty-20190418.223441   1c3500846360        326MB
docker-orchagent-nephos    latest                                  f9c367fb5fc5        368MB
docker-orchagent-nephos    origin_201811.0-dirty-20190418.223441   f9c367fb5fc5        368MB
docker-teamd               latest                                  8a6898e1dfa7        353MB
docker-teamd               origin_201811.0-dirty-20190418.223441   8a6898e1dfa7        353MB
docker-fpm-quagga          latest                                  de4a2a321623        372MB
docker-fpm-quagga          origin_201811.0-dirty-20190418.223441   de4a2a321623        372MB
docker-lldp-sv2            latest                                  7c53844507f0        294MB
docker-lldp-sv2            origin_201811.0-dirty-20190418.223441   7c53844507f0        294MB
docker-dhcp-relay          latest                                  903f08df67cf        258MB
docker-dhcp-relay          origin_201811.0-dirty-20190418.223441   903f08df67cf        258MB
docker-database            latest                                  2b048aa0fe97        255MB
docker-database            origin_201811.0-dirty-20190418.223441   2b048aa0fe97        255MB
docker-snmp-sv2            latest                                  b42a83fc56f8        330MB
docker-snmp-sv2            origin_201811.0-dirty-20190418.223441   b42a83fc56f8        330MB
docker-router-advertiser   latest                                  b6b8150e559a        254MB
docker-router-advertiser   origin_201811.0-dirty-20190418.223441   b6b8150e559a        254MB
docker-platform-monitor    latest                                  f8442c4d55a8        297MB
docker-platform-monitor    origin_201811.0-dirty-20190418.223441   f8442c4d55a8        297MB

admin@sonic:~$ 
**Attach debug file `sudo generate_dump`:**

sonic_dump_sonic_20190510_110308.tar.gz

Signed-off-by: leo.li leo.li@nephosinc.com

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions