Skip to content

Merge Azure Master into IGNW Master#12

Merged
joeslazaro merged 62 commits intoIGNW:masterfrom
sonic-net:master
Jul 25, 2019
Merged

Merge Azure Master into IGNW Master#12
joeslazaro merged 62 commits intoIGNW:masterfrom
sonic-net:master

Conversation

@joej164
Copy link
Copy Markdown

@joej164 joej164 commented Jul 25, 2019

Merging the current state of Azure / Master into our branch before I fork and get t0-8 and t1-8 added.

prsunny and others added 30 commits June 4, 2019 14:22
* Extend warm-reboot test to include the BGP sad pass
Create MIRRORV6 ACL table by default

Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
* Do not crash in case data plane never stop on fast-reboot
Stablize the test by adding pause after the route change

Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
* preboot LAG sad path automation for neigh_lag_down and dut_lag_down scenarios
* [vnet_vxlan]: Enhance vnet_vxlan to test ipv6 vxlan tunnels
Signed-off-by: Anish Narsian anish.narsian@microsoft.com
* Fix testbed_mtu for tasks that invoke fib_test

* Set socket buffer size to 16k
In case the config reload operation takes longer than the PTF
script's running time, checking PTF script PID after config reload
may fail. This change is to improve the robustness of checking
PTF script PID after config reload.
- Improve the data test warm up code:
  Let the data plane IO stablize for 30 seconds before testing.
  We observed ptf instability causing the test to fail.
- Remove config_db.json when fast-reboot into a new image.
  We want the new image to reload minigraph in this case.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* [platform] Implement platform phase 1 cases

Signed-off-by: Xin Wang <xinw@mellanox.com>

* [platform] Add mellanox_psu_controller.py

Changes:
* Add mellanox_psu_controller.py which has Mellanox implementation of PSU controller.
* Increase the delay between reset SFP and checking SFP presence for SFP to be fully recovered.
* Improve the checking of PSU status.
* Correct spelling errors.

Signed-off-by: Xin Wang <xinw@mellanox.com>

* [platform] Improve scripts according to review comments

* Replace inline command strings with predefined variables
* Add test case for testing SFP low power mode

Signed-off-by: Xin Wang <xinw@mellanox.com>

* [platform] Fix the issue of comparing syseeprom output

The order of information output by command "show platform syseeprom"
is not guranteed. This commit improve the method of comparing the
content output by syseeprom plugin and the show command to avoid
the failure caused by inconsistent output order.

Signed-off-by: Xin Wang <xinw@mellanox.com>
…analyzer.yml (#963)

The copy files task was after the fail tests. In case of failure, the
copy task would never get a chance to run. This commit
adjusted the task sequence. In case of failure, copy the files, then
fail the test.

The original copy task copies files with deep folder structure.
This issue was also fixed in this commit.

Signed-off-by: Xin Wang <xinw@mellanox.com>
…' state (#951)

Recently bgp test has failed due to non-established neighbor state frequently.
We highly suspect it is due to some topo deployment issue
which causes DUT unable to learn arp/nd of its bgp neighbor.
By displaying ip neigh info we can easily distinguish
this reason from others thus saving effort of diagnosing.
Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
* [fdb_mac_expire.yml]: FDB MAC Expire test case.
[fdb_mac_expire_test.py]: PTF helper to add Mac in L2 table.
[fdb.yml]: include fdb_mac_expire.yml.

This test case verifies that MAC expires within 10 mins if traffic
is not flowing using it.

Signed-off-by: Praveen Chaudhary<pchaudhary@linkedin.com>

* [fdb_mac_expire.yml]: FDB MAC Expire test case.
[fdb_mac_expire_test.py]: PTF helper to add Mac in L2 table.
[testcases.yml]: include fdb_mac_expire.yml.

This test case verifies that MAC expires within 10 mins if traffic
is not flowing using it.

Signed-off-by: Praveen Chaudhary<pchaudhary@linkedin.com>

* [fdb_mac_expire.yml]: Incorporate swssconfig step to set fdb_aging_timer in fdb_mac_expire.yml

Signed-off-by: Praveen Chaudhary<pchaudhary@linkedin.com>

* [fdb_mac_expire.yml]: minor changes in logs

Signed-off-by: Praveen Chaudhary<pchaudhary@linkedin.com>

* [fdb_mac_expire.yml]: minor log changes to show time correctly.
Example:
"MAC Entires are Cleared within 100 secs."
instead of
"MAC Entires are Cleared within 2*50 secs."

Signed-off-by: Praveen Chaudhary<pchaudhary@linkedin.com>

* [fdb_mac_expire.yml]: Address review comments related to sonic-clear, -it option and block-always.

Signed-off-by: Praveen Chaudhary<pchaudhary@linkedin.com>

* [fdb_mac_expire.yml]: Change "sonic-clear fdb all" to "Clear FDB table".

Signed-off-by: Praveen Chaudhary<pchaudhary@linkedin.com>
…ore rebooting (#975)

- fast-reboot script is an adapted version from 201811 branch. The change is around syncd
  stop: in 201803 branch, if it is Broadcom platform, request syncd to perform cold shutdown.
- Mellanox 201803 branch has a vlan FDB issue causing all vlan IO to flood. Add a knob
  allow_vlan_flooding to ignore this symptom and continue with fast-reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
…iC update (#970)

* Added New test case to verify MAC addr is correct after SONiC to SONiC update.
* Added fixes and additional verifications.
* Added missed fix.
* Increased reboot wait timeout in test case to 300 sec.
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
…#968)

* fix grep ipv6 addr issue

* Add Mellanox onyx fanout switch deploy yml and template

* fix typo

* remove debug code

* revert the change to check_pfcwd_fanout.yml and deploy_pfcwd_fanout.yml

* fix typo
stcheng and others added 28 commits July 5, 2019 10:37
Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
Running the test case: dir_bcast with the topology: 't0-16' or't0-56', the error caused as below:
task path: /var/sonic/sonic-mgmt/ansible/roles/test/tasks/dir_bcast.yml:9

fatal: [str-dut-01]: FAILED! => {"changed": false, "failed": true, "invocation": {"module_args": {"msg": "testbed_type t0-64-32 is invalid."}, "module_name": "fail"}, "msg": "testbed_type t0-56is invalid."}
* [continuous_link_flap.yml]: Continous link flap test.
This is continuous link flap test. In this test,
 1.) Flap all interfaces one by one to cause BGP Flaps (3 iterations).
 2.) Flap all interfaces on peer (FanOutLeaf) one by one to cause BGP Flaps (3 iterations).
 3.) Watch for memory (show system-memory) ,orchagent CPU Utilization and Redis_memory.

Pass Criteria: All routes must be re-learned with < 5% increase in Redis memory
 and with Orchagent CPU consumption below 10% after 3 mins of stopping flaps.
* [continuous_link_flap.yml]: Address review comments for orchagent cpu.
* [continuous_link_flap_helper.yml]: use config cli instead of ifdown\up.
[testcases.yml]: run only on t1 to bring l3 interfaces down.
* [testcases.yml]: Add T0 topo to supported topology for Continuous link flap test.
* [loganalyzer] Fix the IOError of opening match file
* [loganalyzer] Improve the logic of generating match file option
…minutes (#962)

* [vm_set] Improve the start-vms performance

The original approach starts and configures the VMs sequentially.
It takes more than 3 hours to start 32 virtual machines. This
change is to start all the VMs, then configure them one by one.
With this change, starting 32 VMs needs around 30-40 minutes.

Another change in this commit is to configure 'autostart' for
VMs so that the VMs will automatically start running after host
server is rebooted.

Signed-off-by: Xin Wang <xinw@mellanox.com>

* Add batch_size support, make autostart optional
The dhcp_relay daemon use port interface's alias to fill option82 circuit ID instead of using port inteface's name.
In current yml, use minigraph_vlans' member as client port alias name, but there are only port interface's name in minigraph_vlans. Therefore add to use minigraph_port_name_to_alias_map to obtain the port interface's alias
…#997)

By default the log analyzer generate a dump which collect all the
available log files by default in case of failure. This unnecessary and
the dump file could be too big.
This fix is to generate a dump to collect log within 1 hour by default.
If more log is needed, parameter 'dump_since' can be used.

Signed-off-by: Xin Wang <xinw@mellanox.com>
* Upgrade FW for mellanox before fast-reboot

* Move some condition check to the main file
* [warm/fast reboot] make sure that /etc/sonic/config_db.json exsits after upgrade

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* [warm reboot] save config after warm reboot into new image

When new image is defined, test removed /host/config_db.json
before warm rebooting. So after the device boots up, it will
miss /etc/sonic/config_db.json. It is not an issue for the
device to stay up. But it will be an issue when device reboot
again (cold or fast).

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* review comments
Signed-off-by: Yuriy Volynets <yuriyv@mellanox.com>
When test failed due to dataplane disruption issue, config save would be
skipped and leaving the device in vulnerable state. Move config save to
the always block.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
…2.5 (#1005)

* Adding the abblity to use yaml plugin with stdout content

Signed-off-by: Zhiqian Wu <zhiqian.wu@nephosinc.com>
The check is to gate removing a line in known_hosts file, so the check
needs to be checking /root/.ssh/known_hosts.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
As the hard drive permits, we keep a few history images in the past so
that we could easily go back to them.

Recent test failure shows a downside of that decision. When a test failed
and leave an installed image in broken state. We likely restore the system
by booting into another working image. However, if the broken image is not
removed before installation happens again, because the target image exists,
the installation could be skipped or not fixing the existing issue. So
when we boot into the image again, the device is still in broken state.

Removing all non-current and non-next images give the DUT a better chance
to start a clean test.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
…#1019)

* [minigraph] allow generating minigraph without data plane acl defined

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* Change the default behavior to enable data plane acl
…r function (#996)

* [bgp-gr-helper] Add bgp-gr-helper test case

Add script for testing the BGP graceful restart helper function.

Signed-off-by: Xin Wang <xinw@mellanox.com>

* [bgp-gr-helper] Add supported topo t1-64-lag

* [bgp-gr-helper] Improve the wording

* Add checking IPv6 route

* [bgp-gr-helper] Enable graceful restart for t1 topo

* [bgp-gr-helper] Improve script structure

* Add more comments
* Organize the code to make the two test cases more obvious
* Remove the uncessary configuration change of graceful-restart stalepath-time

Signed-off-by: Xin Wang <xinw@mellanox.com>
* [platform] Implement platform phase 2 cases

Implement the SONiC platform phase 2 test cases using the
pytest-ansible framework.

Signed-off-by: Xin Wang <xinw@mellanox.com>

* [platform] Add interface status checking using the interface_facts module

* [platform] Fix some minor issues

* Run reboot command in background to avoid command failure caused by
  SSH connection broken before command returns
* Fine tune the reboot wait timeout values
* Add delay before checking interface status because the intfutil
  command may have no output in time

Signed-off-by: Xin Wang <xinw@mellanox.com>
Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
Signed-off-by: Neetha John <nejo@microsoft.com>
* Add Preboot n BGP member down and n Lag down tests

Signed-off-by: Neetha John <nejo@microsoft.com>
@joeslazaro joeslazaro merged commit 59dd36b into IGNW:master Jul 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.