Describe the bug
Fluentd stops forwarding logs after error:
Unexpected error raised. Stopping the timer. title=:in_tail_refresh_watchers error_class=Errno::ENOENT error="No such file or directory @ rb_file_s_stat -
This error only startted happening when I updated to fluentd 1.12.0 and set follow_inodes true in the tail plugin.
Sometime there is also this warn:
#0 Unparsable line in pos_file:
To Reproduce
Happens randomly in our environment, happens more often on servers with more log volume.
Expected behavior
No error should happen. If an error happens the log should still be forwarded.
Your Environment
- Fluentd or td-agent version: fluentd 1.12.0
- Operating system: we use the docker image fluent/fluentd:v1.12.0-1.0 on Ubuntu 20.04.1 LTS
- Kernel version: 5.4.0-54-generic
- containerd: v1.4.3
- Kubernetes: v1.19.5
Your Configuration
<source>
@type tail
path /var/log/pods/*/*/*.log
pos_file /fluentd/log/fluent_output.log.pos
pos_file_compaction_interval 1h
follow_inodes true
read_from_head true
refresh_interval 2
tag except.kubernetes.*
path_key tailed_path
<parse>
@type regexp
expression "/^(?<time>.+) (stdout|stderr) [FP] (?<log>.+)?$/."
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
# drop empty log lines
<filter except.kubernetes.**.log>
@type grep
<exclude>
key log
pattern /^$/
</exclude>
</filter>
# get kubernetes pod infos from log path
<filter except.kubernetes.**.log>
@type parser
key_name tailed_path
reserve_time true
reserve_data true
remove_key_name_field true
hash_value_field kubernetes
<parse>
@type regexp
expression /^\/var\/log\/pods\/(?<namespace_name>.+)_(?<pod_name>.+)_.*\/(?<container_name>.*)\/.*\.log$/
</parse>
</filter>
<filter except.kubernetes.**.log>
@type record_transformer
enable_ruby true
<record>
# Add cluster name
# Line breaks may be trimmed when collecting from files. Add them back so that
# multi line logs are still in multi line after combined by detect_exceptions.
# Remove this if https://github.com/GoogleCloudPlatform/fluent-plugin-detect-exceptions/pull/10 is released
# strings in ruby are mutable, so make use of it
log ${ record['kubernetes']['cluster'] = "prod"; record["log"] << "\n" }
</record>
</filter>
# expose metrics in prometheus format
<source>
@type prometheus
bind 0.0.0.0
port 24231
metrics_path /metrics
</source>
<source>
@type prometheus_output_monitor
interval 10
<labels>
hostname ${hostname}
</labels>
</source>
# After the exception detection match the record reenter the processing
# from the beginning. Since except is removed from the tag its not processed
# a second time here.
<match except.kubernetes.**.log>
@type detect_exceptions
message log
remove_tag_prefix except
multiline_flush_interval 1
</match>
# Drop Logs from AMA / Spoud forwarding, to prevent Log-Loops
<filter kubernetes.**.log>
@type grep
<or>
<exclude>
key $.kubernetes.namespace_name
pattern /^ama-[a-z]$/
</exclude>
<exclude>
key log
pattern /^\s+$/
</exclude>
</or>
</filter>
# filter out spamy nginx-ingress warnings
<filter kubernetes.**.log>
@type grep
<and>
<exclude>
key $.kubernetes.namespace_name
pattern /^cop-nginx-ingress$/
</exclude>
<exclude>
key log
pattern /W0806|\]TCP2000/
</exclude>
</and>
</filter>
# tag AMA beats
<match kubernetes.**.log>
@type rewrite_tag_filter
<rule>
key log
pattern /beatType/
tag amabeat.${tag}
</rule>
<rule>
key log
pattern /.+/
tag k8s.${tag}
</rule>
</match>
<match **>
@type relabel
@label @standardLog
</match>
# send k8s.** to splunk and amabeat.** to ama endpoint
<label @standardLog>
# add tag field to use it as source in splunk
<filter **.kubernetes.**.log>
@type record_transformer
<record>
tag ${tag}
</record>
</filter>
<match k8s.**>
@type splunk_hec
port 8088
host <edited>
token <edited>
default_index idx_sep_pro
source_key tag
remove_source_key true
ssl_verify false
use_ssl true
default_host "#{ENV['HOSTNAME']}"
default_sourcetype dockerjson
time_as_integer false
<buffer>
flush_thread_count 8
flush_interval 2s
chunk_limit_size 8M
retry_max_interval 30
retry_forever true
</buffer>
</match>
<match amabeat.**>
@type http
endpoint_url <edited>
ssl_no_verify true
bulk_request true
compress_request false
<buffer>
flush_thread_count 8
flush_interval 2s
chunk_limit_size 2M
retry_max_interval 30
</buffer>
</match>
</label>
# Drop fluentd log as it's written to stdout anyway
<label @FLUENT_LOG>
<match fluent.*>
@type null
</match>
</label>
Your Error Log
This show only the last log lines.
2021-01-07 14:58:18 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:01:18 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:01:18 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:04:29 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:04:29 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:05:00 +0000 [warn]: #0 Unparsable line in pos_file: /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.lo000000000020101354 00000000003b113a
2021-01-07 15:07:42 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:07:42 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:10:54 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:10:54 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:14:06 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:14:06 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:17:16 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:17:16 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:20:28 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:20:28 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:23:39 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:23:39 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:26:49 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:26:49 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:30:11 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:in_tail_refresh_watchers error_class=Errno::ENOENT error="No such file or directory @ rb_file_s_stat - /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log"
2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin/in_tail.rb:316:in `stat'
2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin/in_tail.rb:316:in `block in expand_paths'
2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin/in_tail.rb:315:in `each'
2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin/in_tail.rb:315:in `expand_paths'
2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin/in_tail.rb:344:in `refresh_watchers'
2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'
2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/cool.io-1.7.0/lib/cool.io/loop.rb:88:in `run_once'
2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/cool.io-1.7.0/lib/cool.io/loop.rb:88:in `run'
2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2021-01-07 15:30:11 +0000 [error]: #0 Timer detached. title=:in_tail_refresh_watchers
2021-01-07 15:30:11 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:30:11 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:33:45 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
Additional context
Before updating to fluentd 1.12.0 we also got unreadable warnings like below from time to time. I think it has to with the way the kubelet rotates the containerd logs and the new log isn't readable while it's being created:
2021-01-08 07:51:49 +0000 [warn]: #0 /var/log/pods/ns-b_pod-b_9a6734a0-96e4-4005-9dc3-8d46e9e9f9ba/pod-b/89.log unreadable. It is excluded and would be examined next time.
Describe the bug
Fluentd stops forwarding logs after error:
This error only startted happening when I updated to fluentd 1.12.0 and set
follow_inodes truein the tail plugin.Sometime there is also this warn:
To Reproduce
Happens randomly in our environment, happens more often on servers with more log volume.
Expected behavior
No error should happen. If an error happens the log should still be forwarded.
Your Environment
Your Configuration
Your Error Log
This show only the last log lines.
Additional context
Before updating to fluentd 1.12.0 we also got unreadable warnings like below from time to time. I think it has to with the way the kubelet rotates the containerd logs and the new log isn't readable while it's being created: