We currently optimize flushes for unary requests on client-side, by delaying flushing until the end of the RPC. When looking at the code, I realized it doesn't appear we're doing that for server-side.
Using wireshare with the interop client/server with empty_unary, we can see a single packet for the request but three packets for the response:

We should optimize the response flow so that all three frames are sent with a single flush.