gRPC load balancing

When using gRPC service behind knative service I do not observe the expected load balancing pattern. To reproduce this I used your example https://github.com/knative/docs/tree/main/code-samples/serving/grpc-ping-go 

I patched `service.yaml` to force scaling to 2 replicas
```yaml
    metadata:
      annotations:
        autoscaling.knative.dev/min-scale: "2"
        autoscaling.knative.dev/max-scale: "2"
```

This, as any knative service creates k8s service named `grpc-ping`. When I use the client to send multiple pings to services behind "DNS" `grpc-ping.default:80` all of the ping requests are routed to the same pod. I would expect to observe a "fair load balancing" of the ping requests.  

I didn't have time to dive deeper into the source, but what I would expect to be able to do is an "option" to define that my service is indeed a gRPC based. You could then create a headless services for those kinds of "apps".
For example ATM knative controller will will create something like `<service-name>-<orderedNum?>-private` service to resolve the actual IPs to send the requests to once the pods are up and running. In this case we have `grpc-ping-00001-private`. When dealing with gRPC, you could additionally create `grpc-ping-00002-private-headless` and simply have the proxy routing the traffic use gRPCs `dns:///` lookup to do the round robin load balancing.

I know there are probably good reasons why you don't want to special case this, but would be a great feature to allow much more control for the users. 



## What version of Knative?



> knative-v1.8.6

## Expected Behavior


To observe trafic on both replicas being pinged by the client.

## Actual Behavior


All ping requests are routed to the same pod.

## Steps to Reproduce the Problem



Use the simple grpc service from official knative docs 
 https://github.com/knative/docs/tree/main/code-samples/serving/grpc-ping-go 
 
 In the changes bellow I use "my" images (built following the readme, but published on my dockerhub)
 
 To replicate the issue change the `service.yaml` to 

```yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: grpc-ping
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/min-scale: "2"
        autoscaling.knative.dev/max-scale: "2"
    spec:
      containers:
      - image: docker.io/lrotim/grpc-ping-go
        ports:
          - name: h2c
            containerPort: 8080
```

And use the following manifest to spin off pods to ping the service
```yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: go-grpc-cilent
  namespace: default
spec:
  parallelism: 10  # Replace with the number of nodes in your cluster
  completions: 10  # Number of pods to successfully complete
  template:
    metadata:
      labels:
        app: go-grpc-cilent
    spec:
      containers:
      - name: go-grpc-cilent
        image: lrotim/grpc-ping-go
        command:
        - '/client'
        # Add your container image details above
        args:
          - --insecure
          - --skip_verify
          - --server_addr=grpc-ping.default:80
        resources:
          requests:
            cpu: "10m"
            memory: "50Mi"
      restartPolicy: Never
```

When I inspect the logs I see that all traffic was routed to one pod.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gRPC load balancing #14011

What version of Knative?

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gRPC load balancing #14011

Description

What version of Knative?

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions