Skip to content

Improve the stability of SocketModeClient implementations#1114

Merged
seratch merged 2 commits intoslackapi:mainfrom
seratch:socket-mode-more-stability
Sep 13, 2021
Merged

Improve the stability of SocketModeClient implementations#1114
seratch merged 2 commits intoslackapi:mainfrom
seratch:socket-mode-more-stability

Conversation

@seratch
Copy link
Copy Markdown
Contributor

@seratch seratch commented Sep 10, 2021

Summary

This pull request improves the internals of built-in SocketModeClient implementations to properly handle the situations like slackapi/bolt-python#461 I will add a few comments for reviewers.

Category (place an x in each of the [ ])

  • slack_sdk.web.WebClient (sync/async) (Web API client)
  • slack_sdk.webhook.WebhookClient (sync/async) (Incoming Webhook, response_url sender)
  • slack_sdk.socket_mode (Socket Mode client)
  • slack_sdk.signature (Request Signature Verifier)
  • slack_sdk.oauth (OAuth Flow Utilities)
  • slack_sdk.models (UI component builders)
  • slack_sdk.scim (SCIM API client)
  • slack_sdk.audit_logs (Audit Logs API client)
  • slack_sdk.rtm_v2 (RTM client)
  • /docs-src (Documents, have you run ./scripts/docs.sh?)
  • /docs-src-v2 (Documents, have you run ./scripts/docs-v2.sh?)
  • /tutorial (PythOnBoardingBot tutorial)
  • tests/integration_tests (Automated tests for this library)

Requirements (place an x in each [ ])

  • I've read and understood the Contributing Guidelines and have done my best effort to follow them.
  • I've read and agree to the Code of Conduct.
  • I've run python3 -m venv .venv && source .venv/bin/activate && ./scripts/run_validation.sh after making the changes.

@seratch seratch added bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented Version: 3x socket-mode labels Sep 10, 2021
@seratch seratch added this to the 3.11.0 milestone Sep 10, 2021
@seratch seratch requested review from filmaj and srajiang September 10, 2021 05:51
@seratch seratch self-assigned this Sep 10, 2021
@codecov
Copy link
Copy Markdown

codecov bot commented Sep 10, 2021

Codecov Report

Merging #1114 (abd7d60) into main (12188d3) will decrease coverage by 0.43%.
The diff coverage is 4.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1114      +/-   ##
==========================================
- Coverage   85.37%   84.93%   -0.44%     
==========================================
  Files         110      110              
  Lines       10517    10570      +53     
==========================================
- Hits         8979     8978       -1     
- Misses       1538     1592      +54     
Impacted Files Coverage Δ
slack_sdk/socket_mode/aiohttp/__init__.py 43.41% <0.00%> (-2.70%) ⬇️
slack_sdk/socket_mode/builtin/connection.py 34.13% <0.00%> (-2.00%) ⬇️
slack_sdk/socket_mode/websockets/__init__.py 56.66% <7.69%> (-5.38%) ⬇️
slack_sdk/socket_mode/builtin/client.py 74.68% <9.09%> (-5.19%) ⬇️
slack_sdk/socket_mode/websocket_client/__init__.py 56.83% <9.09%> (-3.94%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12188d3...abd7d60. Read the comment docs.

if len(elements) == 2:
if (
len(elements) == 2
and elements[0] == "sdk-ping-pong"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor improvement for #1112

self.logger.debug(f"Sending a message: {message}")
await self.current_session.send_str(message)
try:
await self.current_session.send_str(message)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A race condition where the background job can disconnect the current session if it's not working any more. In this scenario, reconnection will be completed shortly by the background job. By acquiring the connect_operation_lock before retrying, this method can wait for the newly established connection.

try:
client.send_message("foo")
except SlackClientNotConnectedError as _:
pass
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should verify if there are expected logs in this test but I manually checked logs/pytest.log.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard for me to understand the intention of this test. The name of the test, "test sending messages", could be interpreted as validating a variety of behaviours that make up the overall message sending functionality. In context of this PR, I assume that this test checks for the single-retry-when-underlying-WS-connection-is-re-established behaviour, but perhaps a week or two from now I will no longer remember this (my memory is not what it used to be 👴 ).

Two questions:

  • could we rename the test to be more specific as to what we are validating?
  • can we add assertions to the test that make explicit what we expect to happen? I assume that in this test we want to trigger the SlackClientNotConnectedError, but because we pass on it, if the error is not raised, then the test passes. So in effect this test may - or may not - validate behaviour, in that if the exception is raised, we pass, but if it is not, we implicitly pass.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we rename the test to be more specific as to what we are validating?

You are definitely right here. Perhaps, a better test name would be test_send_message_while_disconnection

if the error is not raised, then the test passes.

Good point. we can add an explicit failure for the case where the exception is not propagated. I will update the tests and the typo in comments later on.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@seratch
Copy link
Copy Markdown
Contributor Author

seratch commented Sep 10, 2021

As we are disabling the WebSocket related tests on GitHub Actions due to instability there, the test coverage for the change is not great. But the newly added tests cover the scenario.

Copy link
Copy Markdown
Contributor

@filmaj filmaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, though the flow with the retries and connection locks are getting confusing!

I wonder if it would be a good use of time to create some manner of flow chart for the reconnection logic as a kind of internal documentation? Without a visual document, for me personally, it becomes difficult to reason about race conditions. I would gladly collaborate on such an effort if it is considered to be an Efficient use of time 😄

except Exception as e:
# In most cases, we want to retry this operation with a newly established connection.
# Getting this exception means that this connection has been replaced with a new one
# and it's no loner usable.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# and it's no loner usable.
# and it's no longer usable.

try:
client.send_message("foo")
except SlackClientNotConnectedError as _:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard for me to understand the intention of this test. The name of the test, "test sending messages", could be interpreted as validating a variety of behaviours that make up the overall message sending functionality. In context of this PR, I assume that this test checks for the single-retry-when-underlying-WS-connection-is-re-established behaviour, but perhaps a week or two from now I will no longer remember this (my memory is not what it used to be 👴 ).

Two questions:

  • could we rename the test to be more specific as to what we are validating?
  • can we add assertions to the test that make explicit what we expect to happen? I assume that in this test we want to trigger the SlackClientNotConnectedError, but because we pass on it, if the error is not raised, then the test passes. So in effect this test may - or may not - validate behaviour, in that if the exception is raised, we pass, but if it is not, we implicitly pass.

self.server.stop()
self.server.close()

def test_sending_messages(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here as my last comment about the test above.

@seratch
Copy link
Copy Markdown
Contributor Author

seratch commented Sep 13, 2021

@filmaj Thanks for the comment! I've implemented this Socket Mode client for the sake of a good balance between reasonable runtime performance and safety. To achieve this, some parts of the code became a bit complex. But basically the patterns that we should be aware of are not so complicated. I have been thinking that the key goals here are safely reconnecting, avoiding being rate-limited for reconnection API calls, and never losing an active connection along with reasonable runtime performance. If we desire to make the implementation much simpler, an alternative easy way is to synchronize with lock objects much more in the code. But, needless to say, it affects the performance pretty badly.

If you already have some ideas about a good way to visualize the flow chart along with the purpose of the code, I'm happy to work together on the initiative! 🙌 Even having some simple documents may help us and future maintainers easily recall how it works. Anyways, thanks for your prompt review here!

@filmaj
Copy link
Copy Markdown
Contributor

filmaj commented Sep 13, 2021

Sounds good. The goals you have listed are 💯 .

I will put a bit of time into mapping out the reconnection flow and trying to visualize/document that. I think it would be good if only for my own learning, and if the team finds it useful, perhaps we can include it in this project's documentation somehow in the future.

@seratch seratch force-pushed the socket-mode-more-stability branch from 47c783a to abd7d60 Compare September 13, 2021 22:21
@seratch seratch merged commit 55786c2 into slackapi:main Sep 13, 2021
@seratch seratch deleted the socket-mode-more-stability branch September 13, 2021 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented socket-mode Version: 3x

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants