Skip to content

Linkmgrd subscribing State DB route event #13

Merged
zjswhhh merged 23 commits intosonic-net:masterfrom
zjswhhh:subscribeRouteEvent
Jan 19, 2022
Merged

Linkmgrd subscribing State DB route event #13
zjswhhh merged 23 commits intosonic-net:masterfrom
zjswhhh:subscribeRouteEvent

Conversation

@zjswhhh
Copy link
Copy Markdown
Collaborator

@zjswhhh zjswhhh commented Dec 15, 2021

Description of PR

Summary:
Fixes # (issue)

This PR is to make linkmgrd subscribe events from ROUTE_TABLE in State DB, and react accordingly:

  • If any of the two default route state appears to be 'na', linkmgrd should switch to standby.
  • If both are 'ok', there will be a mux state probing and what happens next depends on linkmgrd state machine and the probing response.

Type of change

  • Bug fix
  • Test case(new/improvement)
  • New feature

Approach

What is the motivation for this PR?

To make linkmgrd subscribe state DB route event, and handle the switchovers.

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Related PR: sonic-net/sonic-swss#2009

zjswhhh and others added 11 commits December 15, 2021 03:58
…net#8)

Description of PR
Summary:
Fixes # (issue)

Fix unstable unit tests.

People observed that linkmgrd unit tests appeared to be flaky, and kept failing PR checks. After checking the build logs, I found that in most of the failures, the state change handler didn't seem to be invoked: Linkmgrd's composite state remained as it was before the state change was post.

This PR updates code to invoke io_service::reset(), sets the io_service to no longer be in a stopped state, allowing subsequent calls to run_one() to invoke handlers. (When io_service is stopped, any call to run_one() will return immediately without invoking any handler. )

Signed-off-by: Jing Zhang zhangjing@microsoft.com

Type of change
 Bug fix

Approach
What is the motivation for this PR?
Stabilize linkmgrd unit tests.

How did you do it?
Reset stopped io_service before calling to run_one(), to guarantee state change handlers are invoked as expected.

How did you verify/test it?
Tested building locally.

This issue is hard to reproduce, the current build failure data is 2.98% (56/1877 in last 30 days). We should expect to see improvement after this change.
Add pull request template for linkmgrd repo.

sign-off: Jing Zhang zhangjing@microsoft.com
}

std::string nextState = "na";
if (mIpv4DefaultRouteState == "ok" && mIpv6DefaultRouteState == "ok") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lguohan do we have concern if either IPv4 or IPv6 BGP is not configured? Can we assume we will have them both eventually?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated per discussion in Gemini sync-up meeting.

{
MUXLOGWARNING(boost::format("%s: state db default route state: %s") % mMuxPortConfig.getPortName() % routeState);

if(mComponentInitState.test(MuxStateComponent)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add space after if

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.


if(mComponentInitState.test(MuxStateComponent)) {
if (ms(mCompositeState) == mux_state::MuxState::Label::Active &&
routeState == "na") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest putting '{' on a new line, especially when if condition expands multiple lines. new line helps to identify the beginning of the if body.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.

}

PortMapIterator portMapIterator = mPortMap.begin();
while(portMapIterator != mPortMap.end()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after while please.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.

MUXLOGWARNING(boost::format("%s: state db default route state: %s") % mMuxPortConfig.getPortName() % routeState);

if (mComponentInitState.test(MuxStateComponent)) {
if (ms(mCompositeState) == mux_state::MuxState::Label::Active && routeState == "na") {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we should check for != Standby. What if mux is in a transient state and we got this event? What do you think?

Copy link
Copy Markdown
Collaborator Author

@zjswhhh zjswhhh Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think this is a good point. I will evaluate the behavior and update accordingly.
Updated, thanks!

if (ms(mCompositeState) == mux_state::MuxState::Label::Active && routeState == "na") {
CompositeState nextState = mCompositeState;
enterLinkProberState(nextState, link_prober::LinkProberState::Wait);
switchMuxState(nextState, mux_state::MuxState::Label::Standby);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it behave if both ToRs have default route as "na" ?

Copy link
Copy Markdown
Collaborator Author

@zjswhhh zjswhhh Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both ToRs will switch to standby, the one switches first will be "active" at the end.

Do we expect both ToRs to be "standby" in this scenario? This is not supported currently by linkmgrd as I know, we might need find other workaround if so.

@zjswhhh zjswhhh requested a review from prsunny January 14, 2022 22:27
lolyu and others added 3 commits January 19, 2022 02:32
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
@zjswhhh
Copy link
Copy Markdown
Collaborator Author

zjswhhh commented Jan 19, 2022

boost::asio::ip::make_address is unable to parse "::/0" and "0.0.0.0/0", and considering the key will always be these two strings, I updated the code and removed make_address(). Changes have been tested on virtual dual testbeds and worked as expected.

@prsunny could you review again? Thanks!

@zjswhhh zjswhhh requested a review from prsunny January 19, 2022 17:15
@zjswhhh zjswhhh merged commit 0c23756 into sonic-net:master Jan 19, 2022
@zjswhhh zjswhhh deleted the subscribeRouteEvent branch January 19, 2022 20:59
zjswhhh added a commit to zjswhhh/sonic-linkmgrd that referenced this pull request Jan 20, 2022
Summary:
Fixes # (issue)

This PR is to make linkmgrd subscribe events from ROUTE_TABLE in State DB, and react accordingly:
- If any of the two default route state appears to be 'na', linkmgrd should switch to standby.
- If both are 'ok', there will be a mux state probing and what happens next depends on linkmgrd state machine and the probing response.

Type of change:
New feature

Motivation for this PR:
To make linkmgrd subscribe state DB route event, and handle the switchovers.

Documentation:
Related PR: sonic-net/sonic-swss#2009
zjswhhh added a commit to zjswhhh/sonic-linkmgrd that referenced this pull request Jan 24, 2022
Summary:
Fixes # (issue)

This PR is to make linkmgrd subscribe events from ROUTE_TABLE in State DB, and react accordingly:  
- If any of the two default route state appears to be 'na', linkmgrd should switch to standby. 
- If both are 'ok', there will be a mux state probing and what happens next depends on linkmgrd state machine and the probing response.  

Type of change:
New feature

Motivation for this PR:
To make linkmgrd subscribe state DB route event, and handle the switchovers. 

Documentation: 
Related PR: sonic-net/sonic-swss#2009
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants