Skip to content

rtm_v2 api spins in infinite loop under gevent#1246

Merged
seratch merged 5 commits intoslackapi:mainfrom
mattbillenstein:main
Jul 25, 2022
Merged

rtm_v2 api spins in infinite loop under gevent#1246
seratch merged 5 commits intoslackapi:mainfrom
mattbillenstein:main

Conversation

@mattbillenstein
Copy link
Copy Markdown
Contributor

@mattbillenstein mattbillenstein commented Jul 25, 2022

Summary

Fix websocket handling code - test case at: https://github.com/mattbillenstein/slack-rtm_v2-recv-bug

From the README.txt there:

When listening on a websocket with the regular rtm_v2 lib using gevent, the
socket will after a time become unresponsive, and the cpu spin at 100%. This is
because the socket.recv() call doesn't return any data if the other end closed
the connection and the code gets into an infinite loop between _fetch and
receive because the error handling logic just returns an empty bytes object

  • so the code whereever it is consuming data can't make progress.

Per the python3 sockets howto:

https://docs.python.org/3/howto/sockets.html:

"When a recv returns 0 bytes, it means the other side has closed (or is in
the process of closing) the connection. You will not receive any more data
on this connection. Ever. You may be able to send data successfully..."

In this fix if we receive no data from sock.recv, just raise OSError so the
consuming code closes and establishes a new connecition.

You can run this code using the two requirements.txt - just connecting to slack
and waiting a time will cause this error to happen with the mainline lib.

$ SLACK_BOT_TOKEN=... ./slack-bot.py

As to why this happens so quickly - I'm guessing there might be other errors in the fetch function that are causing protocol errors on the server and the server to prematurely close the connection - worth investigating further.

Tested on Python 3.10.4, Ubuntu 22.04 x86_64

Also, I might suggest replacing this code with the websocket-client lib (like
the async code uses the websockets lib) - it seems like a better implementation
of consuming data from websockets.

Category (place an x in each of the [ ])

  • slack_sdk.web.WebClient (sync/async) (Web API client)
  • slack_sdk.webhook.WebhookClient (sync/async) (Incoming Webhook, response_url sender)
  • slack_sdk.socket_mode (Socket Mode client)
  • slack_sdk.signature (Request Signature Verifier)
  • slack_sdk.oauth (OAuth Flow Utilities)
  • slack_sdk.models (UI component builders)
  • slack_sdk.scim (SCIM API client)
  • slack_sdk.audit_logs (Audit Logs API client)
  • slack_sdk.rtm_v2 (RTM client)
  • /docs-src (Documents, have you run ./scripts/docs.sh?)
  • /docs-src-v2 (Documents, have you run ./scripts/docs-v2.sh?)
  • /tutorial (PythOnBoardingBot tutorial)
  • tests/integration_tests (Automated tests for this library)

Requirements (place an x in each [ ])

  • I've read and understood the Contributing Guidelines and have done my best effort to follow them.
  • I've read and agree to the Code of Conduct.
  • I've run python3 -m venv .venv && source .venv/bin/activate && ./scripts/run_validation.sh after making the changes.

==== 752 passed, 1 skipped, 1144 warnings in 278.15s (0:04:38) ========

Returning and empty bytes object causes an infinite loop between this and the
fetch function which in some cases can cause 100% cpu utilization (this happens
under gevent).

Also the exception handling here can't just return an empty bytes for the same
reason - the now dead connection cannot be replaced by code that uses this
code.
No data means the other side has closed the connection:

"When a recv returns 0 bytes, it means the other side has closed (or is in the
process of closing) the connection. You will not receive any more data on this
connection. Ever. You may be able to send data successfully"
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jul 25, 2022

CLA assistant check
All committers have signed the CLA.

@seratch seratch self-assigned this Jul 25, 2022
@seratch seratch added bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented rtm-client Version: 3x socket-mode labels Jul 25, 2022
@seratch seratch added this to the 3.18.1 milestone Jul 25, 2022
@codecov
Copy link
Copy Markdown

codecov bot commented Jul 25, 2022

Codecov Report

Merging #1246 (8917093) into main (e24c270) will increase coverage by 0.03%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##             main    #1246      +/-   ##
==========================================
+ Coverage   86.61%   86.64%   +0.03%     
==========================================
  Files         111      111              
  Lines       11026    11025       -1     
==========================================
+ Hits         9550     9553       +3     
+ Misses       1476     1472       -4     
Impacted Files Coverage Δ
slack_sdk/socket_mode/builtin/internals.py 72.24% <66.66%> (-1.88%) ⬇️
slack_sdk/socket_mode/builtin/client.py 90.24% <0.00%> (+0.60%) ⬆️
slack_sdk/socket_mode/builtin/connection.py 67.03% <0.00%> (+2.59%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

@seratch
Copy link
Copy Markdown
Contributor

seratch commented Jul 25, 2022

Hi @mattbillenstein, thank you so much for taking the time to improve the code and we are sorry for the trouble you encountered in the use case with gevent. I've tried to reproduce your situation on my end using your repo but haven't seen the same situation yet. That being said, your improvement here makes sense and the change needs to be released regardless of my local testing results.

Also, I might suggest replacing this code with the websocket-client lib (like the async code uses the websockets lib) - it seems like a better implementation of consuming data from websockets.

Indeed, the websocket-client library is a long-lived and stable library. There were a few reasons not to depend on specific community libraries at that time (zero required dependency policy of this SDK, the inactiveness of the webscoket-client project at that time although new maintainers revitalized the project after that, etc.). Thus, we are not planning to change this at least in the short term.

If you prefer using websocket-client library, it's already supported. You can switch to websocket-client just by using a different wrapper interface: https://slack.dev/python-slack-sdk/socket-mode/index.html#supported-libraries

Thanks again for your contribution here!

@seratch seratch merged commit fcb137b into slackapi:main Jul 25, 2022
@mattbillenstein
Copy link
Copy Markdown
Contributor Author

Thanks! I didn't know the websocket-client wrapper existed, I'll give it a try, I'd been just instantiating websocket.WebSocketApp directly in my code using the url from WebClient.rtm_connect()

seratch added a commit to seratch/python-slack-sdk that referenced this pull request Sep 29, 2022
@seratch seratch mentioned this pull request Sep 29, 2022
16 tasks
seratch added a commit that referenced this pull request Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented rtm-client socket-mode Version: 3x

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants