[Bug]: Proxy response from Bedrock model upon context window length error has 200 status code, unexpected format

### What happened?

When a **streaming** request is too large for a context window limit on a Bedrock model invoked via the Azure OpenAI endpoint on the proxy, it appears to have a few issues with the response that break some clients (see log output with differences below):

* The status code is 200 (Azure OpenAI model invocations return 400)
* Returns a text/event-stream response (Azure OpenAI returns a single JSON response as `application/json`)
* Response format is also a bit different

This ends up breaking interactions with the [Azure OpenAI client SDK](https://learn.microsoft.com/en-us/java/api/overview/azure/ai-openai-readme?view=azure-java-preview) -- it continues to sit and wait for subsequent streaming responses instead of erroring out.


This was tested on the following setup with the repro script below:
* LiteLLM Proxy v1.52.0
* Talking to Claude Sonnet 3.5 (v1/v2) via Bedrock
* The "good" example below was talking to Azure OpenAI for gpt-4o

```
import os
import httpx

def test_model(model):
    print(f"\nTesting {model}")
    print("=" * 40)
    
    messages = [
        {"role": "user", "content": "This is a long test message. " * 10000},
        {"role": "assistant", "content": "This is a long response message. " * 10000},
        {"role": "user", "content": "This is another long test message. " * 10000},
    ]
    
    response = httpx.post(
        f"{os.getenv('LITELLM_BASE_URL')}/openai/deployments/{model}/chat/completions?api-version=2024-09-01-preview",
        headers={
            "Authorization": f"Bearer {os.getenv('LITELLM_API_KEY')}",
            "Content-Type": "application/json"
        },
        json={
            "model": model,
            "messages": messages,
            "stream": True
        },
        timeout=None
    )
    
    print(f"Status: {response.status_code}")
    print(f"Content-Type: {response.headers.get('content-type')}")
    print(f"Content-Length: {response.headers.get('content-length')}")
    print(f"Transfer-Encoding: {response.headers.get('transfer-encoding')}")
    print(f"Body: {response.text}")

if not os.getenv('LITELLM_BASE_URL') or not os.getenv('LITELLM_API_KEY'):
    raise ValueError("Please set LITELLM_BASE_URL and LITELLM_API_KEY environment variables")

test_model("claude-3-5-sonnet-20240620-v1:0")
test_model("gpt-4o")
```

### Relevant log output

```shell
Testing claude-3-5-sonnet-20240620-v1:0
========================================
Status: 200
Content-Type: text/event-stream; charset=utf-8
Content-Length: None
Transfer-Encoding: chunked
Body: data: {"error": {"message": "litellm.BadRequestError: litellm.ContextWindowExceededError: BedrockException: Context Window Error - Bad response code, expected 200: {'status_code': 400, 'headers': {':exception-type': 'validationException', ':content-type': 'application/json', ':message-type': 'exception'}, 'body': b'{\"message\":\"The model returned the following errors: Input is too long for requested model.\"}'}", "type": null, "param": null, "code": "400"}}



Testing gpt-4o
========================================
Status: 400
Content-Type: application/json
Content-Length: 658
Transfer-Encoding: None
Body: {"error":{"message":"litellm.BadRequestError: litellm.ContextWindowExceededError: AzureException ContextWindowExceededError - Error code: 400 - {'error': {'message': \"This model's maximum context length is 128000 tokens. However, your messages resulted in 210018 tokens. Please reduce the length of the messages.\", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}\nmodel=gpt-4o. context_window_fallbacks=None. fallbacks=None.\n\nSet 'context_window_fallback' - https://docs.litellm.ai/docs/routing#fallbacks\nReceived Model Group=gpt-4o\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"400"}}
```


### Twitter / LinkedIn details

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Proxy response from Bedrock model upon context window length error has 200 status code, unexpected format #6629

What happened?

Relevant log output

Twitter / LinkedIn details

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Proxy response from Bedrock model upon context window length error has 200 status code, unexpected format #6629

Description

What happened?

Relevant log output

Twitter / LinkedIn details

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions