Skip to content

config:connected state across hot restarts#6153

Merged
mattklein123 merged 1 commit intoenvoyproxy:masterfrom
ramaraochavali:fix/connected_state
Mar 6, 2019
Merged

config:connected state across hot restarts#6153
mattklein123 merged 1 commit intoenvoyproxy:masterfrom
ramaraochavali:fix/connected_state

Conversation

@ramaraochavali
Copy link
Copy Markdown
Contributor

Description: Some times during hot restart connected state stat has inconsistent value (probably because of how this gauge is updated in shared memory during hot restarts) and is not set to correct value till Envoy is disconnected and reconnected to management server. This PR ensures this is set on receiveMessage so that it stays consistent.
Risk Level: Low
Testing: Existing automated tests
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@dio dio requested review from fredlas and htuch March 3, 2019 14:15
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, makes sense.

/wait

// Some times during hot restarts this stat's value becomes inconsistent and will continue to
// have 0 till it is reconnected. Setting here ensures that it is consistent with the state of
// management server connection.
control_plane_stats_.connected_state_.set(1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where was it set previously? Should we remove it from there? Also is it possible to somehow test this change? NBD if not.

Copy link
Copy Markdown
Contributor Author

@ramaraochavali ramaraochavali Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is set in establishNewStream. I think testing it may be difficult simulating hot restart if that is what you mean.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I missed your other question - I think it is OK to leave it there because if there are no messages exchanged /delay in exchanges after stream establishment (which does not happen today) we will still have technically correct state of this stat that it is connected.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK sounds good.

@mattklein123 mattklein123 self-assigned this Mar 4, 2019
@fredlas
Copy link
Copy Markdown
Contributor

fredlas commented Mar 4, 2019

Having read the Slack thread, I understand better what's going on here. Whenever this line you're adding actually overwrites a "false", that means that the rest of the logic has broken in some way. Definitely good to patch up that inconsistency when it's found, but it should be possible to never let it happen in the first place, which #5910 ought to achieve. Maybe once it's in, this should be replaced with a if (it was false) LOG(ERROR) or something?

@ramaraochavali
Copy link
Copy Markdown
Contributor Author

ramaraochavali commented Mar 5, 2019

Maybe once it's in, this should be replaced with a if (it was false) LOG(ERROR) or something?

I think once it is in and properly validated, we do not expect inconsistency and these kind of errors may not happen right? However, for this specific stat, I think having this here even after the fix does not harm.

@fredlas
Copy link
Copy Markdown
Contributor

fredlas commented Mar 5, 2019

Right, it certainly doesn't hurt. Either it helps, or if the rest of the code is correct, is a no-op. If there is such an error logged, I agree that the error being "this state is wrong but I'm not going to fix it even though I could" is not optimally helpful!

@ramaraochavali
Copy link
Copy Markdown
Contributor Author

@mattklein123 are we good with this or do I have to make any changes?

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattklein123 mattklein123 merged commit 908213c into envoyproxy:master Mar 6, 2019
@ramaraochavali ramaraochavali deleted the fix/connected_state branch March 6, 2019 10:26
fredlas pushed a commit to fredlas/envoy that referenced this pull request Mar 6, 2019
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants