Skip to content

Buffer flushing blocked by EncodingError - not valid UTF-8 #2741

@ceecko

Description

@ceecko

Describe the bug
We run multiple apps in Docker and use td-agent to collect logs.
Sometimes an app produces log which blocks buffer flushing and no logs are flushed until td-agent is restarted.

To Reproduce
Not sure - an app sends some weird log message.

Expected behavior
It's ok if the message cannot be flushed. The best would be to skip it and continue flushing the buffer.

Your Environment
td-agent 1.7.4

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Kernel 3.10.0-1062.4.3.el7.x86_64

Your Configuration

<source>
  @type forward
  port 24224
  bind 127.0.0.1

  @label @docker
</source>

<label @docker>
  <filter docker.a docker.b>
    @type grep
    <exclude>
      key log
      pattern /./
    </exclude>
  </filter>

  <filter docker.*>
    @type throttle
    group_key app_id
    group_bucket_period_s   60
    group_bucket_limit    6000
    group_reset_rate_s     -1
    group_warning_delay_s 60
  </filter>
  
  <filter docker.*>
    @type record_transformer
    remove_keys app_id,container_name
  </filter>
  
  <match docker.*>
    @type mongo_replset
    remove_tag_prefix docker.
    num_retries 500
    
    nodes a,b
    replica_set rs0
    
    user logs
    password xxx
    
    database logs
    collection ${tag}
    capped
    capped_size 2m

    <buffer tag>
      chunk_limit_size 2MB
      total_limit_size 1GB
      flush_mode interval
      flush_interval 2s
      retry_max_interval 60s
      overflow_action drop_oldest_chunk
    </buffer>
  </match>
  <match **>
    @type mongo_replset
    remove_tag_prefix docker.
    num_retries 500
    
    nodes a,b
    replica_set rs0
    
    user logs
    password xxx
    
    database logs
    collection ${tag}
    capped
    capped_size 2m

    <buffer tag>
      chunk_limit_size 2MB
      total_limit_size 1GB
      flush_mode interval
      flush_interval 2s
      retry_max_interval 60s
      overflow_action drop_oldest_chunk
    </buffer>
  </match>
</label>

Your Error Log

The error message is the following (log is redacted due to length)

[warn]: #0 failed to flush the buffer. retry_time=1099 next_retry_seconds=2019-12-17 19:00:32 +0100 chunk="599dad3237f37154fa06d035975603c8" error_class=EncodingError error="log message starts here.....and ends \xE7\x8A is not valid UTF-8: truncated multi-byte sequence"

There's plenty of escaped chars in the middle as well.

Additional context

I'm willing to provide more information required for debugging.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions