ModalConnection closed when running on modal through TURN server #1858

digaobarbosa · 2025-12-29T20:05:51Z

Description

Connection closes unexpectedly when running on modal through TURN server bug fix.

Added an ACK mechanism, where the client is responsible to sending the last frame it received.
This should limit the number of frames/data that are in transit on the TURN server.

The belief is that when sending frames through the datachannel, we hit some limits on the TURN server, for example the inference data sending is probably faster than the client can download from it.

With the ACK, we know an information from the client on really where it is reading currently.

Details:

only enabled for realtime_processing=false (video files)
default window of 20 frames
backwards compatible (the client can send ack to the old version, and the server keeps old behaviour if it's not receiving acks)

Type of change

Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

locally/modal

Any specific deployment considerations

modal

Docs

Docs updated? What were the changes:

digaobarbosa · 2025-12-30T14:07:11Z

inference/core/interfaces/webrtc_worker/modal.py

            "LOG_LEVEL": LOG_LEVEL,
            "ONNXRUNTIME_EXECUTION_PROVIDERS": "[CUDAExecutionProvider,CPUExecutionProvider]",
            "PROJECT": PROJECT,
+            "PYTHONASYNCIODEBUG": str(os.getenv("PYTHONASYNCIODEBUG", "0")),


adds the possibility to enable asyncio debugging

digaobarbosa · 2025-12-30T14:08:12Z

inference/core/interfaces/webrtc_worker/modal.py

            )
            cls_with_options = cls_with_options.with_options(
-                ram=requested_ram_mb,
+                memory=requested_ram_mb,


discovered the right parameter is memory
https://modal.com/docs/reference/modal.Cls#with_options

Interesting! Modal was definitely allocating requested amount of RAM - maybe something changed in their docs

digaobarbosa · 2025-12-30T14:08:52Z

inference/core/interfaces/webrtc_worker/webrtc.py

-            return frame
-        except StopIteration:
+        loop = asyncio.get_running_loop()
+        frame = await loop.run_in_executor(None, lambda: next(self._iterator, None))


efforts to reduce main loop cpu pressure.

digaobarbosa · 2025-12-30T14:09:28Z

inference/core/interfaces/webrtc_worker/webrtc.py

-            heartbeat_callback()
-        await asyncio.sleep(WEBRTC_DATA_CHANNEL_BUFFER_DRAINING_DELAY)
+
+    async def wait_for_buffer_drain() -> None:


efforts to reduce main loop cpu pressure.

digaobarbosa · 2025-12-30T14:09:57Z

inference/core/interfaces/webrtc_worker/webrtc.py

        self.realtime_processing = realtime_processing

+        # Optional receiver-paced flow control (enabled only after first ACK is received)
+        self._ack_last: int = 0


Properties related to the ack window behaviour.

digaobarbosa · 2025-12-30T14:10:43Z

inference/core/interfaces/webrtc_worker/webrtc.py

+        workflow_output: Dict[str, Any],
+        data_output_mode: DataOutputMode,
+    ) -> Tuple[Dict[str, Any], List[str]]:
+        """Serialize workflow outputs in a thread to avoid blocking the event loop."""


efforts to reduce main loop cpu pressure.
This one was that improved the asyncio warning the most.
I believe because of the base64 jpeg conversion

grzegorz-roboflow · 2025-12-30T14:45:49Z

inference/core/interfaces/webrtc_worker/modal.py

            )
            cls_with_options = cls_with_options.with_options(
-                ram=requested_ram_mb,
+                memory=requested_ram_mb,


Interesting! Modal was definitely allocating requested amount of RAM - maybe something changed in their docs

digaobarbosa added 8 commits December 18, 2025 15:12

async changes

7857e74

trying a new sleep

c6d1dde

testing with logs

976baa4

test

025d47c

Merge branch 'main' into modal-killed

c6b0024

serialization on thread

9a35627

fix with ack on datachannel

64477a0

simplifying logs

3d528e5

digaobarbosa self-assigned this Dec 29, 2025

digaobarbosa added 6 commits December 29, 2025 17:13

env variable

faac3f4

Merge branch 'main' into modal-killed

f2770c3

env variable with channel config

4885e9b

fixes for PR

9772cc2

realtime_processing=false only ack behaviour

a2b3bef

20 as default is good enough

9c63a35

digaobarbosa commented Dec 30, 2025

View reviewed changes

digaobarbosa marked this pull request as ready for review December 30, 2025 14:14

digaobarbosa requested review from PawelPeczek-Roboflow, grzegorz-roboflow, hansent, probicheaux and yeldarby as code owners December 30, 2025 14:14

digaobarbosa mentioned this pull request Dec 30, 2025

ack implementation on client side roboflow/inference-sdk-js#11

Merged

2 tasks

grzegorz-roboflow approved these changes Dec 30, 2025

View reviewed changes

digaobarbosa merged commit 63297ae into main Dec 30, 2025
51 checks passed

digaobarbosa deleted the modal-killed branch December 30, 2025 14:54

digaobarbosa mentioned this pull request Jan 2, 2026

Implement ACK-based flow control for Python SDK WebRTC batch processing #1865

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ModalConnection closed when running on modal through TURN server #1858

ModalConnection closed when running on modal through TURN server #1858

Uh oh!

digaobarbosa commented Dec 29, 2025 •

edited

Loading

Uh oh!

digaobarbosa Dec 30, 2025

Uh oh!

digaobarbosa Dec 30, 2025

Uh oh!

grzegorz-roboflow Dec 30, 2025

Uh oh!

digaobarbosa Dec 30, 2025

Uh oh!

digaobarbosa Dec 30, 2025

Uh oh!

digaobarbosa Dec 30, 2025

Uh oh!

digaobarbosa Dec 30, 2025

Uh oh!

grzegorz-roboflow Dec 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ModalConnection closed when running on modal through TURN server #1858

ModalConnection closed when running on modal through TURN server #1858

Uh oh!

Conversation

digaobarbosa commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

digaobarbosa commented Dec 29, 2025 •

edited

Loading