Skip to content

in_tail throws error and crashes process #3327

@davidbhoward

Description

@davidbhoward

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Describe the bug
We are seeing an exception being thrown while Fluentd is starting up, which is causing Fluentd process to crash. We suspect these are caused by short-lived, often run, K8s CronJobs where the Docker logs no longer exists, but the symlinks are still there and end up causing in_tail to pick it up and throw this error. We were trying to fix the in_tail bugs discussed in #3239, which is why we are specifically trying to deploy this version.

To Reproduce
Have several CronJobs that run every 1 min on a node and only live for a very short duration.

Expected behavior
Fluentd would continue to run if the in_tail file doesn't exist and this condition is hit that causes this error or bug is fixed.

Your Environment

  • Fluentd or td-agent version: v1.12.2.
    Note: We are building and installing from source following this guide. We have been doing so before this release without issues.
  • Ruby: 2.7.2
  • Operating system: CentOS 7
  • Kernel version: 4.15.0-70-generic

If you hit the problem with older fluentd version, try latest version first.
This happens on the latest version of Fluentd.

Your Configuration

    <source>
      @type tail
      @log_level debug
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head true
      follow_inodes true
      <parse>
        @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

Your Error Log

2021-04-12 16:00:21 -0700 [warn]: /var/log/containers/obfuscated_container_xxx_1.log not found. Continuing without tailing it.
2021-04-12 16:00:21 -0700 [warn]: stat() for /var/log/containers/obfuscated_container_xxx_1.log failed with ENOENT. Drop tail watcher for now.
2021-04-12 16:00:21 -0700 [error]: unexpected error error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00005641d5026828>\nDid you mean?  each_slice"
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:428:in `stop_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:422:in `rescue in block in start_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:416:in `block in start_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `each_value'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `start_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:359:in `refresh_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:234:in `start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:200:in `block in start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:189:in `block (2 levels) in lifecycle'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:188:in `each'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:188:in `block in lifecycle'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:175:in `each'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:175:in `lifecycle'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:199:in `start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/engine.rb:248:in `start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/engine.rb:147:in `run'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:700:in `block in run_worker'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:951:in `main_process'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:691:in `run_worker'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/command/fluentd.rb:365:in `<top (required)>'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:72:in `require'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:72:in `require'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/bin/fluentd:8:in `<top (required)>'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/bin/fluentd:23:in `load'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/bin/fluentd:23:in `<main>'
2021-04-12 16:00:21 -0700 [error]: unexpected error error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00005641d5026828>\nDid you mean?  each_slice"
  2021-04-12 16:00:21 -0700 [error]: suppressed same stacktrace

Additional context

We have hundreds of Fluentd instances running and there is a single node that seems to be hitting this problem. While looking at the node, I suspect it is caused by many short-lived, often run (every 1 min), CronJobs where the log files don't exist because the containers are now gone but symlinks still exist. Please let me know if there is more information I can provide here to help troubleshoot this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions