test: attempt to stabilize socket mode unit tests #2198

WilliamBergamin · 2025-03-21T14:16:56Z

Summary

Aims to fix #2138

Since #2187 I've notices that flaky tests error with the following log

  1) Integration tests with a WebSocket server
       unexpected socket messages sent to client
         should maintain one serial reconnection attempt if WSS server sends unexpected HTTP response during handshake, like a 409:

      unexpected number of times `close` event was raised during reconnection!
      + expected - actual

      -3
      +2
      
      at Context.<anonymous> (test/integration.spec.js:217:14)
      at processTicksAndRejections (node:internal/process/task_queues:95:5)

should maintain one serial reconnection attempt if WSS server sends unexpected HTTP response during handshake, like a 409 is a real time system unit test, this means that real time elapse dictate the behavior of the unit test. Javascript in combination with event driven design patterns can be used for real time system applications but the implementation of the socket mode client may not be ideal for this.

Regardless, the tests sets clientPingTimeout to 20ms and then waits 50ms to expect 2 retry attempts. But it fails to account for the overhead of setting up the time out and executing retry requests, if this overhead takes more then 10ms a race condition it created between the unit test the socketModeClient and the mock server.

In order to stabilize the test we could increase the clientPingTimeout and let the test wait longer, but this increases the time elapsed to run the test. To avoid this, this PR aims to poll the number of retries, if 2 retries are detected within 50ms then we can consider the unit test to be successful.

If we see that this change does not improve the stability of the test in our CI pipeline we can revisit it, see this PR as an experiment.

Requirements (place an `x` in each `[ ]`)

I've read and understood the Contributing Guidelines and have done my best effort to follow them.
I've read and agree to the Code of Conduct.

codecov · 2025-03-21T14:17:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.99%. Comparing base (0101781) to head (26f0b6e).
Report is 1 commits behind head on main.

✅ All tests successful. No failed tests found.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2198   +/-   ##
=======================================
  Coverage   91.99%   91.99%           
=======================================
  Files          38       38           
  Lines       10386    10386           
  Branches      657      657           
=======================================
  Hits         9555     9555           
  Misses        819      819           
  Partials       12       12

Flag	Coverage Δ
cli-hooks	`95.23% <ø> (ø)`
cli-test	`94.76% <ø> (ø)`
oauth	`77.39% <ø> (ø)`
socket-mode	`61.82% <ø> (ø)`
web-api	`96.93% <ø> (ø)`
webhook	`96.65% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

zimeg

@WilliamBergamin This is such a promising change! Right now this test flakes >8% of the time... 🚀

I'm excited for it, but left one comment about adding an extra assert with this change to avoid possible regressions of retries without waiting.

That's not so blocking, but I think it's worth a quick discussion since the exacts of this test have been somewhat confusing to me in this meantime 👾

zimeg · 2025-03-24T18:52:21Z

packages/socket-mode/test/integration.spec.js

+      const start_time = Date.now();
+      let retries = 0;
+      do {
+        await sleep(2);


🤖 A shorter sleep seems great for more accurate timings!

zimeg · 2025-03-24T19:02:10Z

packages/socket-mode/test/integration.spec.js

+      do {
+        await sleep(2);
+        retries = closed;
+      } while (retries < 2 && Date.now() - start_time < 50);


if 2 retries are detected within 50ms then we can consider the unit test to be successful.

🤔 I'm not sure if breaking on both of these conditions still tests what we're hoping for without other changes, but please let me know if I'm thinking about this odd!

IMO the break makes sense to avoid stalled tests, but we might also want to check that the end time is within a certain range in later asserts? Something between 40ms to 50ms makes sense to me as a start?

But it fails to account for the overhead of setting up the time out and executing retry requests, if this overhead takes more then 10ms a race condition it created between the unit test the socketModeClient and the mock server.

AFAICT the exact timing of this test before changes were made was a big cause of flakes. I think with the above assert we can remain confident that time between retires is still respected without increasing the test duration 🙏

I agree with this proposition 💯 this was something I originally considered and figured it may not be needed

But I believe we need to assert against a range of 25ms to 50ms, on my local machine the time elapse for this task seems to be between 29 and 35ms, which is not what I expected 🤔

zimeg

LGTM! Thanks for increasing our test confidences around this!

The 25ms lower bound is so interesting but also makes sense with the other setups happening within client.start after the timer starts.

Finding a time below 40ms is most interesting though! I wonder if you've uncovered the cause of flakes 👁️‍🗨️

In either case, I'm hoping this resolves flakes of this test. If not, we should revisit this later!

packages/socket-mode/test/integration.spec.js

Co-authored-by: Eden Zimbelman <eden.zimbelman@salesforce.com>

test: attempt to stabilize socket mode unit tests

6ef1fc3

WilliamBergamin added tests M-T: Testing work only pkg:socket-mode applies to `@slack/socket-mode` labels Mar 21, 2025

WilliamBergamin requested a review from zimeg March 21, 2025 14:16

WilliamBergamin self-assigned this Mar 21, 2025

WilliamBergamin added 2 commits March 21, 2025 10:45

Update integration.spec.js

1bc3b82

Update integration.spec.js

6507152

WilliamBergamin requested a review from mwbrooks March 21, 2025 20:34

zimeg requested changes Mar 24, 2025

View reviewed changes

WilliamBergamin added 2 commits March 24, 2025 15:26

Update integration.spec.js

f7039e4

Update integration.spec.js

fe00d09

WilliamBergamin requested a review from zimeg March 24, 2025 19:36

zimeg approved these changes Mar 24, 2025

View reviewed changes

packages/socket-mode/test/integration.spec.js Outdated Show resolved Hide resolved

WilliamBergamin and others added 2 commits March 24, 2025 15:45

Update packages/socket-mode/test/integration.spec.js

b2feadb

Co-authored-by: Eden Zimbelman <eden.zimbelman@salesforce.com>

Merge branch 'main' into attempt-to-stabalize-unit-tests

26f0b6e

WilliamBergamin merged commit d3fe084 into main Mar 24, 2025
57 checks passed

WilliamBergamin deleted the attempt-to-stabalize-unit-tests branch March 24, 2025 19:49

zimeg mentioned this pull request Mar 26, 2025

health: flaky socket mode unit tests #2138

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: attempt to stabilize socket mode unit tests #2198

test: attempt to stabilize socket mode unit tests #2198

Uh oh!

WilliamBergamin commented Mar 21, 2025 •

edited

Loading

Uh oh!

codecov bot commented Mar 21, 2025 •

edited

Loading

Uh oh!

zimeg left a comment

Uh oh!

zimeg Mar 24, 2025

Uh oh!

zimeg Mar 24, 2025

Uh oh!

WilliamBergamin Mar 24, 2025

Uh oh!

zimeg left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

test: attempt to stabilize socket mode unit tests #2198

test: attempt to stabilize socket mode unit tests #2198

Uh oh!

Conversation

WilliamBergamin commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Requirements (place an x in each [ ])

Uh oh!

codecov bot commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zimeg left a comment

Choose a reason for hiding this comment

Uh oh!

zimeg Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

zimeg Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

WilliamBergamin Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

zimeg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WilliamBergamin commented Mar 21, 2025 •

edited

Loading

Requirements (place an `x` in each `[ ]`)

codecov bot commented Mar 21, 2025 •

edited

Loading