Skip to content

[9.3] (backport #12205) kube-stack: Update OTel collector gateway to use OTEL_K8S_POD_IP instead of MY_POD_IP#12256

Merged
osullivandonal merged 1 commit into9.3from
mergify/bp/9.3/pr-12205
Jan 15, 2026
Merged

[9.3] (backport #12205) kube-stack: Update OTel collector gateway to use OTEL_K8S_POD_IP instead of MY_POD_IP#12256
osullivandonal merged 1 commit into9.3from
mergify/bp/9.3/pr-12205

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify bot commented Jan 15, 2026

What does this PR do?

This PR fixes the undefined environment variable MY_POD_IP in the gateway collector configuration.

The gateway collector's OTLP receivers were referencing ${env:MY_POD_IP} which is not defined, causing warning logs:
2026-01-12T09:55:43.240Z warn Configuration references unset environment variable {"name": "MY_POD_IP"}

This change replaces ${env:MY_POD_IP} with ${env:OTEL_K8S_POD_IP}, which is properly defined via the downward API by the OpenTelemetry Helm chart.

Changes:

  • Updated gateway OTLP gRPC endpoint from ${env:MY_POD_IP}:4317 to ${env:OTEL_K8S_POD_IP}:4317
  • Updated gateway OTLP HTTP endpoint from ${env:MY_POD_IP}:4318 to ${env:OTEL_K8S_POD_IP}:4318

This prevents warning logs and ensures the gateway collector binds to the correct pod IP address.

Why is it important?

Currently the kube-stack logs a warning about an undefined env var for the gateway collector:

…/edot-collector/kube-stack main  ✗ kubectl logs -n opentelemetry-operator-system opentelemetry-kube-stack-gateway-collector-6f6b564574-5kct5 | grep -i "MY_POD_IP\|undefined\|fallback"
2026-01-12T09:55:43.240Z	warn	envprovider@v1.45.0/provider.go:61	Configuration references unset environment variable	{"resource": {"service.instance.id": "b7876758-5d19-4751-ad40-5515b501f70f", "service.name": "elastic-agent", "service.version": "9.2.3"}, "name": "MY_POD_IP"}
2026-01-12T09:55:43.240Z	warn	envprovider@v1.45.0/provider.go:61	Configuration references unset environment variable	{"resource": {"service.instance.id": "b7876758-5d19-4751-ad40-5515b501f70f", "service.name": "elastic-agent", "service.version": "9.2.3"}, "name": "MY_POD_IP"}

This PR uses the correct address for the gateway, 0.0.0.0 OTEL_K8S_POD_IP.

The issue calls out using OTEL_K8S_NODE_NAME, this doesn't work and causes the gateway pod to error out:

…/edot-collector/kube-stack main  ❯  kubectl logs -n opentelemetry-operator-system opentelemetry-kube-stack-gateway-collector-cc8944cdd-hrs8x | grep -i -E "error|fail|fatal|panic|unable|invalid"
2026-01-12T10:54:30.920Z	error	graph/graph.go:439	Failed to start component	{"resource": {"service.instance.id": "dec011ce-015a-4816-a7d0-e96843207fed", "service.name": "elastic-agent", "service.version": "9.2.3"}, "error": "listen tcp 172.18.0.2:4317: bind: cannot assign requested address", "type": "Receiver", "id": "otlp"}
cannot start pipelines: failed to start "otlp" receiver: listen tcp 172.18.0.2:4317: bind: cannot assign requested address

The issue:

  • listen tcp 172.18.0.2:4317: bind: cannot assign requested address
  • The gateway pod is trying to bind to 172.18.0.2:4317, but that IP address doesn't belong to this specific gateway pod.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

How to test this PR locally

To test the kube-stack update you can use kind and follow these instructions here.

  • kind create cluster
  • kubectl create namespace opentelemetry-operator-system
  • Add your secrets for elastic cloud:
kubectl create -n opentelemetry-operator-system secret generic elastic-secret-otel \
  --from-literal=elastic_endpoint='YOUR_ELASTICSEARCH_ENDPOINT' \
  --from-literal=elastic_api_key='YOUR_ELASTICSEARCH_API_KEY'
  • Deploy the helm update:
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
helm upgrade --install --namespace opentelemetry-operator-system opentelemetry-kube-stack open-telemetry/opentelemetry-kube-stack --values ./values.yaml --version 0.3.3

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

This is an automatic backport of pull request #12205 done by [Mergify](https://mergify.com).

…ead of MY_POD_IP (#12205)

* Update OTel collector gateway to use 0.0.0.0 instead of MY_POD_IP

This allows the gateway to listen on all network interfaces for
traffic, this follows the same pattern that the daemon collector uses.
This prevents warnings being logged in k8s

* Add changelog for OTel gateway endpoint update

* Update gateway collector binding address to use specific K8s OTEL_K8S_POD_IP

This is done as using 0.0.0.0 can be more risky in the case of DDOS
attacks, see https://opentelemetry.io/docs/security/config-best-practices/#protect-against-denial-of-service-attacks

* Update OTel gateway otlp endpoint to be OTEL_K8S_POD_IP in k8s tests

* Update changelog with OTEL_K8S_POD_IP for kube-stack gateway OTLP update

* Revert update to k8s integration test for OTel gateway

These tests are automatically updated from upstream

* docs: update security url to link to central policy (#12241)

* docs: update security url to link to central policy

* docs: update security url to link to central policy

* [main][Automation] Update elastic/beats to eff88abc6dc8 (#12237)

Co-authored-by: swiatekm <93588780+swiatekm@users.noreply.github.com>
Co-authored-by: Mikołaj Świątek <mail@mikolajswiatek.com>

* Update elastic-agent-libs 0.31.0 -> 0.32.0 (#12240)

Co-authored-by: Mikołaj Świątek <mail@mikolajswiatek.com>

* List available rollbacks (#11751)

* add new messages and operation

* List available rollbacks from CLI

* Add output flag for human, json, yaml

* assert list rollback command in standalone manual rollback tests

* bump beats

* fixup! assert list rollback command in standalone manual rollback tests

* fixup! fixup! assert list rollback command in standalone manual rollback tests

---------

Co-authored-by: Eric Beahan <eric.beahan@elastic.co>

* [main][Automation] Update elastic/beats to 990735bb782a (#12252)

Co-authored-by: swiatekm <93588780+swiatekm@users.noreply.github.com>

---------

Co-authored-by: Mikołaj Świątek <mail@mikolajswiatek.com>
Co-authored-by: Paul McCann <paul.mccann@elastic.co>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: swiatekm <93588780+swiatekm@users.noreply.github.com>
Co-authored-by: Michal Pristas <michal.pristas@elastic.co>
Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
Co-authored-by: Eric Beahan <eric.beahan@elastic.co>
(cherry picked from commit 6ae71f4)
@mergify mergify bot requested a review from a team as a code owner January 15, 2026 13:29
@mergify mergify bot added the backport label Jan 15, 2026
@github-actions github-actions bot added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Jan 15, 2026
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@osullivandonal osullivandonal merged commit 496bec2 into 9.3 Jan 15, 2026
24 checks passed
@osullivandonal osullivandonal deleted the mergify/bp/9.3/pr-12205 branch January 15, 2026 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants