Skip to content

[Bug]: Media Understanding providers don't respect proxy settings #13299

@lu7shuo

Description

@lu7shuo

Description

Media understanding providers (audio transcription with Groq/OpenAI, image/video description) don't use any configured proxy settings. This causes API calls to fail in environments that require a proxy for internet access.

Steps to Reproduce

  1. Configure environment variable proxy (not used by media providers):
   export HTTPS_PROXY=http://127.0.0.1:10808
   export HTTP_PROXY=http://127.0.0.1:10808
  1. Or configure in openclaw.json (also not used):
{
  "env": {
    "HTTPS_PROXY": "http://127.0.0.1:10808",
    "HTTP_PROXY": "http://127.0.0.1:10808"
  }
}
  1. Configure Telegram proxy (works for Telegram, but not for media providers):
{
  "channels": {
    "telegram": {
      "proxy": "http://127.0.0.1:10808"
    }
  }
}
  1. Configure audio transcription:
{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [{
          "provider": "groq",
          "model": "whisper-large-v3-turbo"
        }]
      }
    }
  },
  "env": {
    "GROQ_API_KEY": "gsk-..."
  }
}
  1. Send a voice message to Telegram bot

Expected Behavior

Audio transcription should work. The Groq API call (https://api.groq.com/openai/v1/audio/transcriptions) should use one of the configured proxy settings:

  • Environment variables HTTPS_PROXY / HTTP_PROXY
  • Or a dedicated media proxy configuration

Actual Behavior

  • Telegram getFile API works (uses grammY's configured proxy from channels.telegram.proxy)
  • Groq API call fails (doesn't use environment variables proxy, doesn't use Telegram proxy)
  • Agent receives media:audio placeholder instead of transcribed text
  • No error messages (silent failure)

Only workaround: Enable system-level TUN mode proxy to intercept all network traffic at OS level.

Root Cause

In src/media-understanding/runner.ts:925-937, when calling provider.transcribeAudio():

const result = await provider.transcribeAudio({
  buffer: media.buffer,
  fileName: media.fileName,
  mime: media.mime,
  apiKey,
  baseUrl,
  headers,
  model,
  language: ...,
  prompt,
  query: providerQuery,
  timeoutMs,
  // ❌ No fetchFn parameter passed
});

Then in src/media-understanding/providers/openai/audio.ts:16:

const fetchFn = params.fetchFn ?? fetch;
// Uses global fetch, which doesn't respect HTTPS_PROXY/HTTP_PROXY env vars

Issue:

  • Node.js's undici fetch (used by OpenClaw) doesn't automatically read HTTPS_PROXY / HTTP_PROXY environment variables
  • Media understanding providers have no way to configure proxy
  • Telegram proxy is only applied to grammY client, not to media provider API calls

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions