splunkmetric removal of "event' field seems to break Splunk heavy forwarder

### Relevant telegraf.conf:

```toml

[[outputs.http]]
   url = "<URL>"
   # insecure_skip_verify = false
   data_format = "splunkmetric"
   splunkmetric_hec_routing = true
   [outputs.http.headers]
      Content-Type = "application/json"
      Authorization = "Splunk <HEC TOKEN>"

[[inputs.cpu]]
  percpu = false
  totalcpu = true
  collect_cpu_time = false
  report_active = false


```

### System info:



Telegraf 1.17.0
Splunk 8.0.2
Heavy forwarder
Indexer cluster

### Steps to reproduce:



Use the splunkmetric data format + http output.

### Expected behavior:

Splunk should receive the metrics via HEC with no problems.  Worked fine on 1.15.3.

### Actual behavior:

Splunk-forwarder hates the events produced.  Metrics show up in Splunk, but soon the forwarder gives errors like this:

```
Jan 27 20:40:14 01-27-2021 20:40:13.925 +0000 WARN  TcpOutputProc - Read operation timed out expecting ACK from 10.0.1.26:29997 in 300 seconds.
Jan 27 20:40:14 01-27-2021 20:40:13.925 +0000 WARN  TcpOutputProc - Possible duplication of events with channel=source::http:telegraf|host::redacted|httpevent|, streamId=1618, offset=0 on host=10.0.1.26:29997
```

Soon after that, TcpOutputProc locks up and is unable to send anything to the indexers at all.  Worse yet, because we have Splunk's `persistentQueueSize` option set on our HEC input, the problematic events stick around through a restart of the forwarder, even if new problematic events are not arriving.  We had to wipe the forwarder out entirely and rebuild it to recover.

### Additional info:

We carefully pared down variables until we arrived on the problem: the removal of the `"event": "metric"` field in #8039.  Starting with a fresh, working forwarder, we can cause the above problems by sending events without the "event" field to the forwarder over HEC using `curl`.  Sending the exact same events with `"event": "metric"` does not cause this problem.

I'm honestly not at all clear on why Splunk hates these events.  I also don't have a good explanation for why Splunk Support said that the "event" field is unnecessary in #8039.  Perhaps there's something else in the OP's configuration that obviates the need for the "event" field?

For now, we've reverted to 1.15.3, pending a fix to telegraf.  Perhaps the "event" field should be optional, defaulting to present?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

splunkmetric removal of "event' field seems to break Splunk heavy forwarder #8761

Relevant telegraf.conf:

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

Additional info:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

splunkmetric removal of "event' field seems to break Splunk heavy forwarder #8761

Description

Relevant telegraf.conf:

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

Additional info:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions