[sanity-check]: Show failed check items in stack trace and error logs by theasianpianist · Pull Request #2242 · sonic-net/sonic-mgmt

theasianpianist · 2020-09-18T21:16:42Z

Signed-off-by: Lawrence Lee lawlee@microsoft.com

Description of PR

Summary:
When the sanity check fails, add failed items in error log and include them in the stack trace.

Type of change

Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Approach

What is the motivation for this PR?

Currently when the sanity check fails, the stack trace is not able to show all of the check results since the results are too long. To see the failed items it is necessary to view the test logs, which can be annoying since they are not always displayed together (esp. in Jenkins)

How did you do it?

Select the failed check items and include them in the error log. Then also add these items to the pytest.fail message so they are printed in the stack trace.

How did you verify/test it?

Force a sanity check failure on a device and run some tests. Verified that the stack trace included the failed check items. Verify that the error log included the failed check items.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Previous output (no check-items in stack trace)

localhost = <tests.common.devices.Localhost object at 0x7f9e38bb7cd0>
duthost = <tests.common.devices.SonicHost object at 0x7f9e3a4d0750>
request = <SubRequest 'sanity_check' for <Function test_bgp_facts>>
fanouthosts = <omitted>
testbed = <omitted>

    @pytest.fixture(scope="module", autouse=True)
    def sanity_check(localhost, duthost, request, fanouthosts, testbed):
        logger.info("Start pre-test sanity check")
    
        skip_sanity = False
        allow_recover = False
        recover_method = "adaptive"
        check_items = set(copy.deepcopy(constants.DEFAULT_CHECK_ITEMS))  # Default check items
        post_check = False
    
        customized_sanity_check = None
        for m in request.node.iter_markers():
            logger.info("Found marker: m.name=%s, m.args=%s, m.kwargs=%s" % (m.name, m.args, m.kwargs))
            if m.name == "sanity_check":
                customized_sanity_check = m
                break
    
        if customized_sanity_check:
            logger.info("Process marker %s in script. m.args=%s, m.kwargs=%s" % (m.name, str(m.args), str(m.kwargs)))
            skip_sanity = customized_sanity_check.kwargs.get("skip_sanity", False)
            allow_recover = customized_sanity_check.kwargs.get("allow_recover", False)
            recover_method = customized_sanity_check.kwargs.get("recover_method", "adaptive")
            if allow_recover and recover_method not in constants.RECOVER_METHODS:
                pytest.warning("Unsupported recover method")
                logger.info("Fall back to use default recover method 'config_reload'")
                recover_method = "config_reload"
    
            check_items = _update_check_items(check_items,
                                              customized_sanity_check.kwargs.get("check_items", []),
                                              constants.SUPPORTED_CHECK_ITEMS)
            post_check = customized_sanity_check.kwargs.get("post_check", False)
    
        if request.config.option.skip_sanity:
            skip_sanity = True
        if request.config.option.allow_recover:
            allow_recover = True
        items = request.config.getoption("--check_items")
        if items:
            items_array=str(items).split(',')
            check_items = _update_check_items(check_items, items_array, constants.SUPPORTED_CHECK_ITEMS)
    
        # ignore BGP check for particular topology type
        if testbed['topo']['type'] == 'ptf' and 'bgp' in check_items:
            check_items.remove('bgp')
    
        logger.info("Sanity check settings: skip_sanity=%s, check_items=%s, allow_recover=%s, recover_method=%s, post_check=%s" % \
            (skip_sanity, check_items, allow_recover, recover_method, post_check))
    
        if skip_sanity:
            logger.info("Skip sanity check according to command line argument or configuration of test script.")
            yield
            return
    
        if not check_items:
            logger.info("No sanity check item is specified, no pre-test sanity check")
            yield
            logger.info("No sanity check item is specified, no post-test sanity check")
            return
    
        print_logs(duthost, constants.PRINT_LOGS)
        check_results = do_checks(duthost, check_items)
        logger.info("!!!!!!!!!!!!!!!! Pre-test sanity check results: !!!!!!!!!!!!!!!!\n%s" % \
                    json.dumps(check_results, indent=4))
        if any([result["failed"] for result in check_results]):
            if not allow_recover:
                pytest.fail("Pre-test sanity check failed, allow_recover=False {}".format(check_results))
                return
    
            logger.info("Pre-test sanity check failed, try to recover, recover_method=%s" % recover_method)
            recover(duthost, localhost, fanouthosts, check_results, recover_method)
            logger.info("Run sanity check again after recovery")
            new_check_results = do_checks(duthost, check_items)
            logger.info("!!!!!!!!!!!!!!!! Pre-test sanity check after recovery results: !!!!!!!!!!!!!!!!\n%s" % \
                        json.dumps(new_check_results, indent=4))
            if any([result["failed"] for result in new_check_results]):
>               pytest.fail("Pre-test sanity check failed again after recovered by '%s'" % recover_method)
E               Failed: Pre-test sanity check failed again after recovered by 'adaptive'

allow_recover = True
check_items = set(['bgp', 'dbmemory', 'interfaces', 'processes', 'services'])
check_results = [{'check_item': 'services', 'failed': False, 'services_status': {'bgp': True, 'database': True, 'lldp': True, 'pmon': ...tatus': {'bgp': True, 'database': True, 'lldp': True, 'pmon': True, ...}}, {'check_item': 'dbmemory', 'failed': False}]
customized_sanity_check = None
duthost    = <tests.common.devices.SonicHost object at 0x7f9e3a4d0750>
fanouthosts = <omitted>
items      = False
localhost  = <tests.common.devices.Localhost object at 0x7f9e38bb7cd0>
m          = Mark(name='device_type', args=('vs',), kwargs={})
new_check_results = [{'check_item': 'services', 'failed': False, 'services_status': {'bgp': True, 'database': True, 'lldp': True, 'pmon': ...tatus': {'bgp': True, 'database': True, 'lldp': True, 'pmon': True, ...}}, {'check_item': 'dbmemory', 'failed': False}]
post_check = False
recover_method = 'adaptive'
request    = <SubRequest 'sanity_check' for <Function test_bgp_facts>>
result     = {'check_item': 'dbmemory', 'failed': False}
skip_sanity = False
testbed    = <omitted>

common/plugins/sanity_check/__init__.py:118: Failed

New output:

<same as previous>
...
if any([result["failed"] for result in new_check_results]):
                failed_items = json.dumps([result for result in new_check_results if result["failed"]], indent=4)
                logger.error("Failed check items:\n{}".format(failed_items))
>               pytest.fail("Pre-test sanity check failed again after recovered by '{}' with failed items:\n{}".format(recover_method, failed_items))
E               Failed: Pre-test sanity check failed again after recovered by 'adaptive' with failed items:
E               [
E                   {
E                       "check_item": "bgp",
E                       "failed": true,
E                       "down_neighbors": [
E                           "10.0.0.59",
E                           "10.0.0.61",
E                           "10.0.0.63",
E                           "fc00::7e",
E                           "fc00::7a",
E                           "10.0.0.57",
E                           "fc00::76",
E                           "fc00::72"
E                       ]
E                   },
E                   {
E                       "check_item": "interfaces",
E                       "failed": true,
E                       "down_ports": [
E                           "PortChannel0001",
E                           "PortChannel0003",
E                           "PortChannel0002",
E                           "PortChannel0004"
E                       ]
E                   }
E               ]
...
<same as previous>

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>

daall

Can you add an example of the before/after of the output?

…#12292) linkmgrd: * a5ac7f6 2022-10-05 | [Active-Active] Post link prober stats to state db (sonic-net#140) (HEAD -> 202205, github/202205) [Jing Zhang] * f4b0e53 2022-10-05 | [Active-Active] Retry config mux mode standby (sonic-net#139) [Jing Zhang] utilities: * a255838 2022-10-04 | [minigraph] new workflow for golden path (sonic-net#2396) (HEAD -> 202205, github/202205) [jingwenxie] * 99425a8 2022-10-03 | [actions] Support Semgrep by Github Actions (sonic-net#2417) [Mai Bui] * f41e4d1 2022-09-30 | Fix for show vxlan tunnel command display issue sonic-net#11902 (sonic-net#2391) [Senthil Bhava] * e1d827e 2022-09-29 | [VxLAN]Fix Vxlan delete command to throw error when there are references (sonic-net#2404) [Sudharsan Dhamal Gopalarathnam] * d77acf8 2022-09-28 | [doc] add documentation on automatic techsupport based on memory (sonic-net#2411) [Stepan Blyshchak] * 2cfc75a 2022-09-28 | [doc] update "config feature" section with "--block" option (sonic-net#2409) [Stepan Blyshchak] * 9dc8471 2022-09-28 | [Vxlanmgrd] [CPA] Update the vxlan_tunnel name len to be under IFNAMIZ to overcome netdev creation failure (sonic-net#2398) [Vivek] * 342589e 2022-10-03 | Added cisco config platform commands (sonic-net#2242) (sonic-net#2418) [yucgu] swss: * 9d9f395 2022-10-04 | [intfmgr]: Enable `accept_untracked_na` kernel param (sonic-net#2436) (HEAD -> 202205, github/202205) [Lawrence Lee] * 6b6d25d 2022-10-04 | [orchdaemon]: Fixed sairedis record file rotation (sonic-net#2480) [Bryan Crossland] Signed-off-by: Ying Xie <ying.xie@microsoft.com> Signed-off-by: Ying Xie <ying.xie@microsoft.com>

[sanity-check]: Show failed check items in stack trace and error logs

3c0e8ef

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>

theasianpianist requested review from a team and daall September 18, 2020 21:16

daall reviewed Sep 19, 2020

View reviewed changes

daall requested review from wangxin and yxieca September 19, 2020 00:38

yxieca approved these changes Sep 19, 2020

View reviewed changes

theasianpianist merged commit a8c79bf into sonic-net:master Sep 21, 2020

theasianpianist deleted the sanity-check-log branch September 21, 2020 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sanity-check]: Show failed check items in stack trace and error logs#2242

[sanity-check]: Show failed check items in stack trace and error logs#2242
theasianpianist merged 1 commit intosonic-net:masterfrom
theasianpianist:sanity-check-log

theasianpianist commented Sep 18, 2020 •

edited

Loading

Uh oh!

daall left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

theasianpianist commented Sep 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of PR

Type of change

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Uh oh!

daall left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

theasianpianist commented Sep 18, 2020 •

edited

Loading