Skip to content

[portsorch] process PortsOrch tables in specific order#1108

Merged
qiluo-msft merged 4 commits intosonic-net:masterfrom
stepanblyschak:3iteration
Jan 15, 2020
Merged

[portsorch] process PortsOrch tables in specific order#1108
qiluo-msft merged 4 commits intosonic-net:masterfrom
stepanblyschak:3iteration

Conversation

@stepanblyschak
Copy link
Copy Markdown
Contributor

@stepanblyschak stepanblyschak commented Oct 28, 2019

What I did

Why I did it

How I verified it

Details if related

To support hitless warm reboot on Mellanox we need to add lag member before any other objects are created that depend on LAG, like RIF

@wendani wendani requested review from lguohan and qiluo-msft October 28, 2019 18:08
Comment thread orchagent/portsorch.cpp
}
}

void PortsOrch::doTask()
Copy link
Copy Markdown
Contributor

@qiluo-msft qiluo-msft Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doTask [](start = 16, length = 6)

Why you override Orch::doTask()? It will leave tasks in other tables undone. #Closed

Copy link
Copy Markdown
Contributor Author

@stepanblyschak stepanblyschak Oct 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tables? everything that m_consumerMap has will be processed #Closed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you are trying to specify drain order. How about reuse Orch::doTask for your second half?


In reply to: 339998153 [](ancestors = 339998153)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I reuse Orch::doTask failed items from APP_PORT_TABLE will be processed again, thats why a condition if (consumer == portConsumer) to skip port table was added in second half.

Another scenario - when portsorch processes "PortInitDone" it explicitly returns because other orchs can start their operations. Orch::doTask reuse will make PortsOrch proceed after PortinitDone and configure port attributes like admin status, mtu, speed and so on, before BufferOrch apply buffers so there will be a chance that ports will be operationaly up when buffers are applied.

@stepanblyschak stepanblyschak force-pushed the 3iteration branch 3 times, most recently from 21e4061 to 7475254 Compare November 27, 2019 18:54
@stepanblyschak stepanblyschak changed the title [orchdaemon][portsorch] warm restoration requires 3 iterations [portsorch] process PortsOrch tables in specific order Nov 27, 2019
@qiluo-msft qiluo-msft self-assigned this Dec 3, 2019
Comment thread orchagent/portsorch.cpp Outdated
{
auto tableName = it.first;
auto consumer = it.second.get();
if (find(tableOrder.begin(), tableOrder.end(), tableName) != tableOrder.end())
Copy link
Copy Markdown
Contributor

@qiluo-msft qiluo-msft Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find(tableOrder.begin(), tableOrder.end(), tableName) != tableOrder.end() [](start = 12, length = 73)

The code shows if tableName is found in tableOrder, it will be drained.

I guess you wanted to drain ones not found in tableOrder. #Closed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my guess is correct, why your test pass?


In reply to: 353389068 [](ancestors = 353389068)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, this is a bug. I'm fixing it.
Test passes because tableOrder contains all tables which PortsOrch uses, so no more tables had to be drained.

@stepanblyschak
Copy link
Copy Markdown
Contributor Author

retest this please

@sonic-net sonic-net deleted a comment from stepanblyschak Dec 4, 2019
qiluo-msft
qiluo-msft previously approved these changes Dec 4, 2019
@stepanblyschak
Copy link
Copy Markdown
Contributor Author

retest this please

Comment thread orchagent/orchdaemon.cpp
*/

for (auto it = 0; it < 4; it++)
for (auto it = 0; it < 3; it++)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this iteration count as define ? or constexpr ? with proper name

@stepanblyschak
Copy link
Copy Markdown
Contributor Author

test_vlan.py::TestVlan::test_RemoveVlanWithRouterInterface fails, while on my VS setup:

root@arc-host74-009:~/tests_stepanb/tests/tests# sudo pytest -v --dvsname=vs test_vlan.py 
================================================================================== test session starts ===================================================================================
platform linux2 -- Python 2.7.12, pytest-3.7.1, py-1.5.4, pluggy-0.7.1 -- /usr/bin/python
cachedir: .pytest_cache
rootdir: /root/tests_stepanb/tests/tests, inifile:
collected 25 items                                                                                                                                                                       

test_vlan.py::TestVlan::test_VlanAddRemove PASSED                                                                                                                                  [  4%]
test_vlan.py::TestVlan::test_MultipleVlan PASSED                                                                                                                                   [  8%]
test_vlan.py::TestVlan::test_VlanIncrementalConfig PASSED                                                                                                                          [ 12%]
test_vlan.py::TestVlan::test_AddVlanWithIncorrectKeyPrefix[test_input0-0] PASSED                                                                                                   [ 16%]
test_vlan.py::TestVlan::test_AddVlanWithIncorrectKeyPrefix[test_input1-0] PASSED                                                                                                   [ 20%]
test_vlan.py::TestVlan::test_AddVlanWithIncorrectKeyPrefix[test_input2-0] PASSED                                                                                                   [ 24%]
test_vlan.py::TestVlan::test_AddVlanWithIncorrectKeyPrefix[test_input3-1] PASSED                                                                                                   [ 28%]
test_vlan.py::TestVlan::test_AddVlanWithIncorrectValueType[test_input0-0] PASSED                                                                                                   [ 32%]
test_vlan.py::TestVlan::test_AddVlanWithIncorrectValueType[test_input1-0] PASSED                                                                                                   [ 36%]
test_vlan.py::TestVlan::test_AddVlanWithIncorrectValueType[test_input2-0] PASSED                                                                                                   [ 40%]
test_vlan.py::TestVlan::test_AddVlanWithIncorrectValueType[test_input3-1] PASSED                                                                                                   [ 44%]
test_vlan.py::TestVlan::test_AddPortChannelToVlan PASSED                                                                                                                           [ 48%]
test_vlan.py::TestVlan::test_AddVlanMemberWithNonExistVlan PASSED                                                                                                                  [ 52%]
test_vlan.py::TestVlan::test_RemoveNonexistentVlan PASSED                                                                                                                          [ 56%]
test_vlan.py::TestVlan::test_VlanMemberTaggingMode[test_input0-expected0] PASSED                                                                                                   [ 60%]
test_vlan.py::TestVlan::test_VlanMemberTaggingMode[test_input1-expected1] PASSED                                                                                                   [ 64%]
test_vlan.py::TestVlan::test_VlanMemberTaggingMode[test_input2-expected2] PASSED                                                                                                   [ 68%]
test_vlan.py::TestVlan::test_VlanMemberTaggingMode[test_input3-expected3] PASSED                                                                                                   [ 72%]
test_vlan.py::TestVlan::test_VlanMemberTaggingMode[test_input4-expected4] PASSED                                                                                                   [ 76%]
test_vlan.py::TestVlan::test_AddMaxVlan SKIPPED                                                                                                                                    [ 80%]
test_vlan.py::TestVlan::test_RemoveVlanWithRouterInterface PASSED                                                                                                                  [ 84%]
test_vlan.py::TestVlan::test_VlanDbData PASSED                                                                                                                                     [ 88%]
test_vlan.py::TestVlan::test_VlanMemberDbData[test_input0-expected0] PASSED                                                                                                        [ 92%]
test_vlan.py::TestVlan::test_VlanMemberDbData[test_input1-expected1] PASSED                                                                                                        [ 96%]
test_vlan.py::TestVlan::test_VlanMemberDbData[test_input2-expected2] PASSED                                                                                                        [100%]

========================================================================= 24 passed, 1 skipped in 242.49 seconds =========================================================================
root@arc-host74-009:~/tests_stepanb/tests/tests# docker ps
CONTAINER ID        IMAGE                                      COMMAND                  CREATED             STATUS              PORTS               NAMES
4b3e0ce549a2        docker-sonic-vs:sonic-swss-build-pr.1201   "/usr/bin/supervisord"   12 minutes ago      Up 4 minutes                            vs
36c76e7944bd        debian                                     "bash"                   13 minutes ago      Up 13 minutes                           sw

According to log from test run on jenkins:

Dec  5 11:18:37.179896 4a2e10c63990 NOTICE #root: === start test test_RemoveVlanWithRouterInterface ===
Dec  5 11:18:37.383982 4a2e10c63990 NOTICE #root: === start marker 2019-12-05T11:18:37.229293 ===
Dec  5 11:18:37.446020 4a2e10c63990 NOTICE #orchagent: :- addVlan: Create an empty VLAN Vlan100 vid:100
Dec  5 11:18:38.436335 4a2e10c63990 NOTICE #orchagent: :- addRouterIntfs: Create router interface Vlan100 MTU 9100
Dec  5 11:18:38.438816 4a2e10c63990 NOTICE #orchagent: :- addIp2MeRoute: Create IP2me route ip:20.0.0.8
Dec  5 11:18:38.438984 4a2e10c63990 NOTICE #orchagent: :- addDirectedBroadcast: Add broadcast route for ip:20.0.0.15
Dec  5 11:18:35.843411 4a2e10c63990 NOTICE #syncd: message repeated 5 times: [ :- onMsg: received RTM_DELLINK for Vlan2]
Dec  5 11:18:39.442714 4a2e10c63990 NOTICE #syncd: :- onMsg: received RTM_DELLINK for Vlan100
Dec  5 11:18:39.479044 4a2e10c63990 ERR #orchagent: :- removeVlan: Failed to remove ref count 1 VLAN Vlan100
Dec  5 11:18:40.445081 4a2e10c63990 INFO #supervisord: intfmgrd Device "Vlan100" does not exist.
Dec  5 11:18:40.448031 4a2e10c63990 INFO #supervisord: intfmgrd Device "Vlan100" does not exist.
Dec  5 11:18:40.454443 4a2e10c63990 INFO #supervisord: intfmgrd Cannot find device "Vlan100"
Dec  5 11:18:40.454559 4a2e10c63990 ERR #intfmgrd: :- exec: /sbin/ip link set "Vlan100" nomaster: Success
Dec  5 11:18:40.454578 4a2e10c63990 ERR #intfmgrd: :- setIntfVrf: Command '/sbin/ip link set "Vlan100" nomaster' failed with rc 256
Dec  5 11:18:40.458669 4a2e10c63990 ERR #intfmgrd: :- exec: /sbin/ip address "del" "20.0.0.8/29" broadcast "20.0.0.15" dev "Vlan100": Success
Dec  5 11:18:40.458688 4a2e10c63990 ERR #intfmgrd: :- setIntfIp: Command '/sbin/ip address "del" "20.0.0.8/29" broadcast "20.0.0.15" dev "Vlan100"' failed with rc 256
Dec  5 11:18:40.455018 4a2e10c63990 ERR #orchagent: message repeated 2 times: [ :- removeVlan: Failed to remove ref count 1 VLAN Vlan100]
Dec  5 11:18:40.459208 4a2e10c63990 NOTICE #orchagent: :- removeIp2MeRoute: Remove packet action trap route ip:20.0.0.8
Dec  5 11:18:40.459368 4a2e10c63990 NOTICE #orchagent: :- removeDirectedBroadcast: Remove broadcast route ip:20.0.0.15
Dec  5 11:18:40.459379 4a2e10c63990 ERR #orchagent: :- removeVlan: Failed to remove ref count 1 VLAN Vlan100
Dec  5 11:18:40.459773 4a2e10c63990 NOTICE #orchagent: :- removeRouterIntfs: Remove router interface for port Vlan100
Dec  5 11:18:40.460108 4a2e10c63990 NOTICE #syncd: :- removeRif: Trying to remove nonexisting router interface counter from Id 0x60000000005db
Dec  5 11:18:42.604982 4a2e10c63990 NOTICE #root: === finish test test_RemoveVlanWithRouterInterface ===
Dec  5 11:18:42.782008 4a2e10c63990 NOTICE #root: === start test test_VlanDbData ===
Dec  5 11:18:42.846858 4a2e10c63990 NOTICE #orchagent: :- removeVlan: Remove VLAN Vlan100 vid:100
Dec  5 11:18:42.847126 4a2e10c63990 NOTICE #orchagent: :- addVlan: Create an empty VLAN Vlan2 vid:2

The vlan was actually removed but 2 seconds later, by that time the test already finishes with failure.
I think additional sleeps in this test are needed.

@kcudnik
Copy link
Copy Markdown
Contributor

kcudnik commented Dec 12, 2019

if its building against sairedis-master, then it can be related to this change: sonic-net/sonic-sairedis#543, its causing vs test to fail, but seems to fail in different test than yours, im investigaiting to fix this

@liat-grozovik
Copy link
Copy Markdown
Collaborator

@kcudnik any update? can we move forward with this PR?

@kcudnik
Copy link
Copy Markdown
Contributor

kcudnik commented Dec 24, 2019

Hey, you closed this PR ??

@qiluo-msft qiluo-msft removed the request for review from stcheng January 8, 2020 16:58
@qiluo-msft qiluo-msft reopened this Jan 8, 2020
@qiluo-msft
Copy link
Copy Markdown
Contributor

qiluo-msft commented Jan 8, 2020

Please resolve the conflict. #Closed

stepanb and others added 3 commits January 9, 2020 17:22
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>

Conflicts:
	tests/mock_tests/portsorch_ut.cpp
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
Don't care if PORT_TABLE was not processed fully,
we care about LAG, LAG_TABLE, VLAN and VLAN_TABLE.

Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
@stepanblyschak stepanblyschak force-pushed the 3iteration branch 2 times, most recently from 5dd2275 to 2a71444 Compare January 9, 2020 16:06
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
@stepanblyschak
Copy link
Copy Markdown
Contributor Author

retest this please

@liat-grozovik
Copy link
Copy Markdown
Collaborator

retest vs please

@liat-grozovik
Copy link
Copy Markdown
Collaborator

retest this please

@qiluo-msft qiluo-msft merged commit 1eac91e into sonic-net:master Jan 15, 2020
abdosi pushed a commit that referenced this pull request Jan 21, 2020
* [portsorch] process PortsOrch tables in specific order
* [portsorch] fix other tables not drained and make tableOrder constexpr
* [portsorch_ut] fix test according to fix in portsorch
Don't care if PORT_TABLE was not processed fully,
we care about LAG, LAG_TABLE, VLAN and VLAN_TABLE.
* [tests] fix ut after unsuccessful conflict resolution
prsunny pushed a commit that referenced this pull request Jan 28, 2020
* [portsorch] process PortsOrch tables in specific order
* [portsorch] fix other tables not drained and make tableOrder constexpr
* [portsorch_ut] fix test according to fix in portsorch
Don't care if PORT_TABLE was not processed fully,
we care about LAG, LAG_TABLE, VLAN and VLAN_TABLE.
* [tests] fix ut after unsuccessful conflict resolution
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants