Improve the stability of SocketModeClient implementations#1114
Improve the stability of SocketModeClient implementations#1114seratch merged 2 commits intoslackapi:mainfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1114 +/- ##
==========================================
- Coverage 85.37% 84.93% -0.44%
==========================================
Files 110 110
Lines 10517 10570 +53
==========================================
- Hits 8979 8978 -1
- Misses 1538 1592 +54
Continue to review full report at Codecov.
|
| if len(elements) == 2: | ||
| if ( | ||
| len(elements) == 2 | ||
| and elements[0] == "sdk-ping-pong" |
| self.logger.debug(f"Sending a message: {message}") | ||
| await self.current_session.send_str(message) | ||
| try: | ||
| await self.current_session.send_str(message) |
There was a problem hiding this comment.
A race condition where the background job can disconnect the current session if it's not working any more. In this scenario, reconnection will be completed shortly by the background job. By acquiring the connect_operation_lock before retrying, this method can wait for the newly established connection.
| try: | ||
| client.send_message("foo") | ||
| except SlackClientNotConnectedError as _: | ||
| pass |
There was a problem hiding this comment.
Ideally we should verify if there are expected logs in this test but I manually checked logs/pytest.log.
There was a problem hiding this comment.
It is hard for me to understand the intention of this test. The name of the test, "test sending messages", could be interpreted as validating a variety of behaviours that make up the overall message sending functionality. In context of this PR, I assume that this test checks for the single-retry-when-underlying-WS-connection-is-re-established behaviour, but perhaps a week or two from now I will no longer remember this (my memory is not what it used to be 👴 ).
Two questions:
- could we rename the test to be more specific as to what we are validating?
- can we add assertions to the test that make explicit what we expect to happen? I assume that in this test we want to trigger the
SlackClientNotConnectedError, but because wepasson it, if the error is not raised, then the test passes. So in effect this test may - or may not - validate behaviour, in that if the exception is raised, wepass, but if it is not, we implicitlypass.
There was a problem hiding this comment.
could we rename the test to be more specific as to what we are validating?
You are definitely right here. Perhaps, a better test name would be test_send_message_while_disconnection
if the error is not raised, then the test passes.
Good point. we can add an explicit failure for the case where the exception is not propagated. I will update the tests and the typo in comments later on.
|
As we are disabling the WebSocket related tests on GitHub Actions due to instability there, the test coverage for the change is not great. But the newly added tests cover the scenario. |
11a221c to
47c783a
Compare
filmaj
left a comment
There was a problem hiding this comment.
Looks good, though the flow with the retries and connection locks are getting confusing!
I wonder if it would be a good use of time to create some manner of flow chart for the reconnection logic as a kind of internal documentation? Without a visual document, for me personally, it becomes difficult to reason about race conditions. I would gladly collaborate on such an effort if it is considered to be an Efficient use of time 😄
| except Exception as e: | ||
| # In most cases, we want to retry this operation with a newly established connection. | ||
| # Getting this exception means that this connection has been replaced with a new one | ||
| # and it's no loner usable. |
There was a problem hiding this comment.
| # and it's no loner usable. | |
| # and it's no longer usable. |
| try: | ||
| client.send_message("foo") | ||
| except SlackClientNotConnectedError as _: | ||
| pass |
There was a problem hiding this comment.
It is hard for me to understand the intention of this test. The name of the test, "test sending messages", could be interpreted as validating a variety of behaviours that make up the overall message sending functionality. In context of this PR, I assume that this test checks for the single-retry-when-underlying-WS-connection-is-re-established behaviour, but perhaps a week or two from now I will no longer remember this (my memory is not what it used to be 👴 ).
Two questions:
- could we rename the test to be more specific as to what we are validating?
- can we add assertions to the test that make explicit what we expect to happen? I assume that in this test we want to trigger the
SlackClientNotConnectedError, but because wepasson it, if the error is not raised, then the test passes. So in effect this test may - or may not - validate behaviour, in that if the exception is raised, wepass, but if it is not, we implicitlypass.
| self.server.stop() | ||
| self.server.close() | ||
|
|
||
| def test_sending_messages(self): |
There was a problem hiding this comment.
Same comment here as my last comment about the test above.
|
@filmaj Thanks for the comment! I've implemented this Socket Mode client for the sake of a good balance between reasonable runtime performance and safety. To achieve this, some parts of the code became a bit complex. But basically the patterns that we should be aware of are not so complicated. I have been thinking that the key goals here are safely reconnecting, avoiding being rate-limited for reconnection API calls, and never losing an active connection along with reasonable runtime performance. If we desire to make the implementation much simpler, an alternative easy way is to synchronize with lock objects much more in the code. But, needless to say, it affects the performance pretty badly. If you already have some ideas about a good way to visualize the flow chart along with the purpose of the code, I'm happy to work together on the initiative! 🙌 Even having some simple documents may help us and future maintainers easily recall how it works. Anyways, thanks for your prompt review here! |
|
Sounds good. The goals you have listed are 💯 . I will put a bit of time into mapping out the reconnection flow and trying to visualize/document that. I think it would be good if only for my own learning, and if the team finds it useful, perhaps we can include it in this project's documentation somehow in the future. |
47c783a to
abd7d60
Compare
Summary
This pull request improves the internals of built-in SocketModeClient implementations to properly handle the situations like slackapi/bolt-python#461 I will add a few comments for reviewers.
Category (place an
xin each of the[ ])/docs-src(Documents, have you run./scripts/docs.sh?)/docs-src-v2(Documents, have you run./scripts/docs-v2.sh?)/tutorial(PythOnBoardingBot tutorial)tests/integration_tests(Automated tests for this library)Requirements (place an
xin each[ ])python3 -m venv .venv && source .venv/bin/activate && ./scripts/run_validation.shafter making the changes.