in_tail throws error and crashes process

Check [CONTRIBUTING guideline](https://github.com/fluent/fluentd/blob/master/CONTRIBUTING.md) first and here is the list to help us investigate the problem.

**Describe the bug**
We are seeing an exception being thrown while Fluentd is starting up, which is causing Fluentd process to crash.  We suspect these are caused by short-lived, often run, K8s CronJobs where the Docker logs no longer exists, but the symlinks are still there and end up causing in_tail to pick it up and throw this error.  We were trying to fix the in_tail bugs discussed in https://github.com/fluent/fluentd/issues/3239, which is why we are specifically trying to deploy this version.

**To Reproduce**
Have several CronJobs that run every 1 min on a node and only live for a very short duration.

**Expected behavior**
Fluentd would continue to run if the in_tail file doesn't exist and this condition is hit that causes this error or bug is fixed.

**Your Environment**
- Fluentd or td-agent version: `v1.12.2`. 
  Note: We are building and installing from source [following this guide](https://docs.fluentd.org/installation/install-from-source).  We have been doing so before this release without issues.  
- Ruby: `2.7.2`
- Operating system: `CentOS 7`
- Kernel version: `4.15.0-70-generic`

If you hit the problem with older fluentd version, try latest version first.  
This happens on the latest version of Fluentd.  

**Your Configuration**

```
    <source>
      @type tail
      @log_level debug
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head true
      follow_inodes true
      <parse>
        @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>
```

**Your Error Log**

```
2021-04-12 16:00:21 -0700 [warn]: /var/log/containers/obfuscated_container_xxx_1.log not found. Continuing without tailing it.
2021-04-12 16:00:21 -0700 [warn]: stat() for /var/log/containers/obfuscated_container_xxx_1.log failed with ENOENT. Drop tail watcher for now.
2021-04-12 16:00:21 -0700 [error]: unexpected error error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00005641d5026828>\nDid you mean?  each_slice"
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:428:in `stop_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:422:in `rescue in block in start_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:416:in `block in start_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `each_value'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `start_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:359:in `refresh_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:234:in `start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:200:in `block in start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:189:in `block (2 levels) in lifecycle'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:188:in `each'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:188:in `block in lifecycle'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:175:in `each'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:175:in `lifecycle'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:199:in `start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/engine.rb:248:in `start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/engine.rb:147:in `run'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:700:in `block in run_worker'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:951:in `main_process'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:691:in `run_worker'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/command/fluentd.rb:365:in `<top (required)>'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:72:in `require'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:72:in `require'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/bin/fluentd:8:in `<top (required)>'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/bin/fluentd:23:in `load'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/bin/fluentd:23:in `<main>'
2021-04-12 16:00:21 -0700 [error]: unexpected error error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00005641d5026828>\nDid you mean?  each_slice"
  2021-04-12 16:00:21 -0700 [error]: suppressed same stacktrace
```

**Additional context**

We have hundreds of Fluentd instances running and there is a single node that seems to be hitting this problem.  While looking at the node, I suspect it is caused by many short-lived, often run (every 1 min), CronJobs where the log files don't exist because the containers are now gone but symlinks still exist.  Please let me know if there is more information I can provide here to help troubleshoot this issue.  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in_tail throws error and crashes process #3327

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

in_tail throws error and crashes process #3327

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions