Sporadic connection failures due to 'ping timeout' in grpcio==1.68.0 and newer.

The python grpc release `grpcio==1.68.0` introduces a regression that I can consistently reproduce in an Apache Beam pipeline running on Cloud Dataflow. 

In the reproduction, I have two processes: a Python process (client) and a C++ process (server), running in their own docker containers. The processes establish streaming bidirectional RPC  channels with each other, and both run on the same vm, connecting to a `localhost:someport` address.

The client process writes ~15-50  GB of data over network to GCS in a separate thread,  while the connection channels with the server owned by other threads.

If the amount of data written in the side thread crosses a certain threshold (between 10 and 15 GB), the GRPC connections between client and server starts to terminate with errors like: 

`UNKNOWN:Error received from peer ipv6:%!B(MISSING)::1%!D(MISSING):12371 {created_time:\"2024-12-03T13:53:05.992753213+00:00\", grpc_status:14, grpc_message:"ping timeout"`

**Mitigation**:

Set an environment variable:  `GRPC_EXPERIMENTS="-event_engine_client"` in the environment of the Python process or downgrade to an earlier version of grpc. We  are sticking with "grpc<1.66.0" in Apache Beam for now and don't reproduce this error.  

cc: @drfloob, @yashykt @XuanWang-Amos who have started investigating this and might be able to add details and/or rootcause once more information becomes available.
 
### What operating system (Linux, Windows,...) and version?
Linux.

Reproducible on `grpcio==1.68.0` and newer, including  the current latest version (`grpcio==1.71.0`).

### What runtime / compiler are you using (e.g. python version or version of gcc)
Python 3.10



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sporadic connection failures due to 'ping timeout' in grpcio==1.68.0 and newer. #39113

What operating system (Linux, Windows,...) and version?

What runtime / compiler are you using (e.g. python version or version of gcc)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sporadic connection failures due to 'ping timeout' in grpcio==1.68.0 and newer. #39113

Description

What operating system (Linux, Windows,...) and version?

What runtime / compiler are you using (e.g. python version or version of gcc)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions