Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.
Describe the bug
We are seeing an exception being thrown while Fluentd is starting up, which is causing Fluentd process to crash. We suspect these are caused by short-lived, often run, K8s CronJobs where the Docker logs no longer exists, but the symlinks are still there and end up causing in_tail to pick it up and throw this error. We were trying to fix the in_tail bugs discussed in #3239, which is why we are specifically trying to deploy this version.
To Reproduce
Have several CronJobs that run every 1 min on a node and only live for a very short duration.
Expected behavior
Fluentd would continue to run if the in_tail file doesn't exist and this condition is hit that causes this error or bug is fixed.
Your Environment
- Fluentd or td-agent version:
v1.12.2.
Note: We are building and installing from source following this guide. We have been doing so before this release without issues.
- Ruby:
2.7.2
- Operating system:
CentOS 7
- Kernel version:
4.15.0-70-generic
If you hit the problem with older fluentd version, try latest version first.
This happens on the latest version of Fluentd.
Your Configuration
<source>
@type tail
@log_level debug
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
follow_inodes true
<parse>
@type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
Your Error Log
2021-04-12 16:00:21 -0700 [warn]: /var/log/containers/obfuscated_container_xxx_1.log not found. Continuing without tailing it.
2021-04-12 16:00:21 -0700 [warn]: stat() for /var/log/containers/obfuscated_container_xxx_1.log failed with ENOENT. Drop tail watcher for now.
2021-04-12 16:00:21 -0700 [error]: unexpected error error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00005641d5026828>\nDid you mean? each_slice"
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:428:in `stop_watchers'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:422:in `rescue in block in start_watchers'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:416:in `block in start_watchers'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `each_value'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `start_watchers'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:359:in `refresh_watchers'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:234:in `start'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:200:in `block in start'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:189:in `block (2 levels) in lifecycle'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:188:in `each'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:188:in `block in lifecycle'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:175:in `each'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:175:in `lifecycle'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:199:in `start'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/engine.rb:248:in `start'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/engine.rb:147:in `run'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:700:in `block in run_worker'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:951:in `main_process'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:691:in `run_worker'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/command/fluentd.rb:365:in `<top (required)>'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:72:in `require'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:72:in `require'
2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/bin/fluentd:8:in `<top (required)>'
2021-04-12 16:00:21 -0700 [error]: /usr/local/bin/fluentd:23:in `load'
2021-04-12 16:00:21 -0700 [error]: /usr/local/bin/fluentd:23:in `<main>'
2021-04-12 16:00:21 -0700 [error]: unexpected error error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00005641d5026828>\nDid you mean? each_slice"
2021-04-12 16:00:21 -0700 [error]: suppressed same stacktrace
Additional context
We have hundreds of Fluentd instances running and there is a single node that seems to be hitting this problem. While looking at the node, I suspect it is caused by many short-lived, often run (every 1 min), CronJobs where the log files don't exist because the containers are now gone but symlinks still exist. Please let me know if there is more information I can provide here to help troubleshoot this issue.
Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.
Describe the bug
We are seeing an exception being thrown while Fluentd is starting up, which is causing Fluentd process to crash. We suspect these are caused by short-lived, often run, K8s CronJobs where the Docker logs no longer exists, but the symlinks are still there and end up causing in_tail to pick it up and throw this error. We were trying to fix the in_tail bugs discussed in #3239, which is why we are specifically trying to deploy this version.
To Reproduce
Have several CronJobs that run every 1 min on a node and only live for a very short duration.
Expected behavior
Fluentd would continue to run if the in_tail file doesn't exist and this condition is hit that causes this error or bug is fixed.
Your Environment
v1.12.2.Note: We are building and installing from source following this guide. We have been doing so before this release without issues.
2.7.2CentOS 74.15.0-70-genericIf you hit the problem with older fluentd version, try latest version first.
This happens on the latest version of Fluentd.
Your Configuration
Your Error Log
Additional context
We have hundreds of Fluentd instances running and there is a single node that seems to be hitting this problem. While looking at the node, I suspect it is caused by many short-lived, often run (every 1 min), CronJobs where the log files don't exist because the containers are now gone but symlinks still exist. Please let me know if there is more information I can provide here to help troubleshoot this issue.