Skip to content

Elasticsearch output does not recover after connection failure #40705

@belimawr

Description

@belimawr

For confirmed bugs, please report:

  • Version: main, v8.15.1
  • Operating System: Linux

Update: The problematic part of the change was reverted in #40776 and is targeted for release in 8.15.2.

Steps to reproduce

  1. Deploy a Filebeat sending data to a remote Elasticsearch using a domain name in the configuration
  2. Confirm Filebeat is running and sending data
  3. Disable the network
  4. Wait for a DNS lookup error and the publisher errors:
    {
      "log.level": "warn",
      "@timestamp": "2024-09-06T08:30:33.240-0400",
      "log.logger": "transport",
      "log.origin": {
        "function": "github.com/elastic/elastic-agent-libs/transport.TestNetDialer.func1",
        "file.name": "transport/tcp.go",
        "file.line": 53
      },
      "message": "DNS lookup failure \"remote-es.elastic.cloud\": lookup remote-es.elastic.cloud: Temporary failure in name resolution",
      "service.name": "filebeat",
      "ecs.version": "1.6.0"
    }
    {
      "log.level": "error",
      "@timestamp": "2024-09-06T15:50:15.140-0400",
      "log.logger": "publisher_pipeline_output",
      "log.origin": {
        "function": "github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*netClientWorker).run",
        "file.name": "pipeline/client_worker.go",
        "file.line": 148
      },
      "message": "Failed to connect to backoff(elasticsearch(https://remote-es.elastic.cloud:443)): Get \"https://remote-es.elastic.cloud:443\": context canceled",
      "service.name": "filebeat",
      "ecs.version": "1.6.0"
    }
    {
      "log.level": "info",
      "@timestamp": "2024-09-06T15:50:15.140-0400",
      "log.logger": "publisher_pipeline_output",
      "log.origin": {
        "function": "github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*netClientWorker).run",
        "file.name": "pipeline/client_worker.go",
        "file.line": 139
      },
      "message": "Attempting to reconnect to backoff(elasticsearch(https://remote-es.elastic.cloud:443)) with 475 reconnect attempt(s)",
      "service.name": "filebeat",
      "ecs.version": "1.6.0"
    }
  5. Enable the network
  6. Ensure the machine can reach the internet and the remote Elasticsearch
  7. Filebeat will keep logging the same publisher errors and not sending data

The configuration I used:

filebeat.inputs:
  - type: filestream
    id: my-filestream-id
    paths:
      - /tmp/some-logs.txt

output.elasticsearch:
  hosts: ["https://remote-es.elastic.cloud:443"]
  username: elastic
  password: some-very-secret-password

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions