Skip to content

[Tech debt] Upgrade Starscream 3.1.1 → 4.x to fix _outputStreamCallbackFunc EXC_BAD_ACCESS crashes #109

@mixc6763-prog

Description

@mixc6763-prog

Summary

Current pin: pod 'Starscream','3.1.1' (Podfile, released 2019-10-14).

Starscream 3.x uses CFStream and has a known race-condition crash in the output-stream callback, fixed by the v4.x rewrite that switched the transport to Network.framework. A real crash matching this signature was just collected from a v1.1.2.1 install.

Evidence

Local crash on 1.1.2.1 (build 1001002001, macOS 26.3.1, arm64):

Exception: EXC_BAD_ACCESS / SIGSEGV  subtype=KERN_INVALID_ADDRESS at 0x3b0400
Crashed thread queue: com.vluxe.starscream.websocket
  #0 libobjc.dylib              objc_retain
  #1 CoreFoundation             _outputStreamCallbackFunc
  #2 CoreFoundation             _signalEventSync
  #3 CoreFoundation             ___signalEventQueue_block_invoke
  #4 libdispatch.dylib          _dispatch_call_block_and_release
  ...

Upstream issue is daltoniam/Starscream#508 — title is literally "EXC_BAD_ACCESS on objc_msgSend in _outputStreamCallbackFunc", same com.vluxe.starscream.websocket queue, same stack shape. Closed as completed in the 4.0 rewrite.

Why it became visible now

The traffic / log websocket reconnect path got more aggressive in v1.1.1:

  • 1e092489 add 10s traffic watchdog to recover from half-dead websocket
  • a7ff6267 reconnect traffic/log websocket on clean disconnect
  • 2d24187c reset traffic stream on IP change for all websocket modes

More disconnect+reconnect cycles widen the window where the 3.1.1 callback can fire on a half-deallocated stream. Code itself is correct; the underlying library is just race-prone.

Upgrade target

Starscream 4.0.8 (latest, 2024-03-07). v3.x last release was 3.1.1; v4.0 is a full rewrite, breaking API.

Known 4.x breaking changes that affect ApiRequest.swift:

  • WebSocket(url:)WebSocket(request:) (URLRequest required)
  • socket.delegate = self + WebSocketDelegate.websocketDidConnect/Disconnect/... → unified WebSocketDelegate.didReceive(event:client:) enum-driven callback
  • socket.disconnect(forceTimeout:)socket.disconnect() (timeout no longer a parameter)
  • TLS / pinning APIs changed

ApiRequest.swift is the only consumer (1 file, ~50 LOC touching WebSocket), so the blast radius is small.

Implementation tasks

  • Bump pod 'Starscream' to ~> 4.0 in Podfile, run pod install, commit Podfile.lock
  • Migrate ApiRequest.swift WebSocket usages:
    • Switch initializer to URLRequest-based
    • Replace per-event delegate methods with didReceive(event:client:) switch
    • Update disconnect calls (drop forceTimeout:)
  • Smoke test:
    • Traffic stream connects, reports up/down
    • Log stream connects, lines flow
    • Sleep/wake cycle still recovers (verify watchdog still triggers reconnect)
    • Network change (Wi-Fi → cellular hotspot) still resets stream cleanly
    • Configure reload still tears down and re-creates streams without crashing
  • Leave running for a few hours on the dev machine; confirm no com.vluxe.starscream.websocket queue crash in DiagnosticReports
  • Update RELEASE_NOTES.md

Risk / rollback

Medium. Starscream 4.x has been stable since 2021, but the delegate API change touches every place we receive websocket events. If migration introduces regressions, easiest rollback is reverting the Podfile bump and the ApiRequest.swift migration commit together. Keep the upgrade in its own PR so revert is one click.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions