rtm_v2 api spins in infinite loop under gevent#1246
Conversation
Returning and empty bytes object causes an infinite loop between this and the fetch function which in some cases can cause 100% cpu utilization (this happens under gevent). Also the exception handling here can't just return an empty bytes for the same reason - the now dead connection cannot be replaced by code that uses this code.
No data means the other side has closed the connection: "When a recv returns 0 bytes, it means the other side has closed (or is in the process of closing) the connection. You will not receive any more data on this connection. Ever. You may be able to send data successfully"
Codecov Report
@@ Coverage Diff @@
## main #1246 +/- ##
==========================================
+ Coverage 86.61% 86.64% +0.03%
==========================================
Files 111 111
Lines 11026 11025 -1
==========================================
+ Hits 9550 9553 +3
+ Misses 1476 1472 -4
Help us with your feedback. Take ten seconds to tell us how you rate us. |
|
Hi @mattbillenstein, thank you so much for taking the time to improve the code and we are sorry for the trouble you encountered in the use case with gevent. I've tried to reproduce your situation on my end using your repo but haven't seen the same situation yet. That being said, your improvement here makes sense and the change needs to be released regardless of my local testing results.
Indeed, the websocket-client library is a long-lived and stable library. There were a few reasons not to depend on specific community libraries at that time (zero required dependency policy of this SDK, the inactiveness of the webscoket-client project at that time although new maintainers revitalized the project after that, etc.). Thus, we are not planning to change this at least in the short term. If you prefer using websocket-client library, it's already supported. You can switch to websocket-client just by using a different wrapper interface: https://slack.dev/python-slack-sdk/socket-mode/index.html#supported-libraries Thanks again for your contribution here! |
|
Thanks! I didn't know the websocket-client wrapper existed, I'll give it a try, I'd been just instantiating websocket.WebSocketApp directly in my code using the url from WebClient.rtm_connect() |
Summary
Fix websocket handling code - test case at: https://github.com/mattbillenstein/slack-rtm_v2-recv-bug
From the README.txt there:
When listening on a websocket with the regular rtm_v2 lib using gevent, the
socket will after a time become unresponsive, and the cpu spin at 100%. This is
because the socket.recv() call doesn't return any data if the other end closed
the connection and the code gets into an infinite loop between _fetch and
receive because the error handling logic just returns an empty bytes object
Per the python3 sockets howto:
https://docs.python.org/3/howto/sockets.html:
In this fix if we receive no data from sock.recv, just raise OSError so the
consuming code closes and establishes a new connecition.
You can run this code using the two requirements.txt - just connecting to slack
and waiting a time will cause this error to happen with the mainline lib.
$ SLACK_BOT_TOKEN=... ./slack-bot.py
As to why this happens so quickly - I'm guessing there might be other errors in the fetch function that are causing protocol errors on the server and the server to prematurely close the connection - worth investigating further.
Tested on Python 3.10.4, Ubuntu 22.04 x86_64
Also, I might suggest replacing this code with the websocket-client lib (like
the async code uses the websockets lib) - it seems like a better implementation
of consuming data from websockets.
Category (place an
xin each of the[ ])/docs-src(Documents, have you run./scripts/docs.sh?)/docs-src-v2(Documents, have you run./scripts/docs-v2.sh?)/tutorial(PythOnBoardingBot tutorial)tests/integration_tests(Automated tests for this library)Requirements (place an
xin each[ ])python3 -m venv .venv && source .venv/bin/activate && ./scripts/run_validation.shafter making the changes.==== 752 passed, 1 skipped, 1144 warnings in 278.15s (0:04:38) ========