Describe the bug
After upgrading fluentd from 0.14.24 to 1.12.1, tail plugin + my script which rotates the watched log using mv sometimes won't watch the rotated log file and send the first part of rotated logs.
To Reproduce
Watch /tmp/script.log
<source>
@type tail
path /tmp/script.log
tag my-tag
pos_file /var/log/td-agent/script.pos
<parse>
@type json
time_key time
</parse>
</source>
mv /tmp/script.log to /tmp/script.log.20210314 and write to /tmp/script.log
Expected behavior
The content in /tmp/script.log should be sent.
Your Environment
- Fluentd or td-agent version: td-agent 4.1.0 fluentd 1.12.1 (e3effa3)
- Operating system: Amazon Linux 2
- Kernel version: 4.14.192-147.314.amzn2.x86_64
Your Configuration
<source>
@type tail
path /tmp/script.log
tag my-tag
pos_file /var/log/td-agent/script.pos
<parse>
@type json
time_key time
</parse>
</source>
Your Error Log
none (No rotate detection log)
Additional context
I'm trying to upgrade fluentd from 0.14.24 to 1.12.1 and faced two problems.
My script is called by cron every day. At the start of the script, it mvs log to rotate, do something great and dump logs to a new file.
FILE_NAME = '/tmp/script.log'
def rotate_file_if_exist
if File.exists?(FILE_NAME)
new_file_name = "#{FILE_NAME}.#{Time.now.strftime('%Y%m%d')}"
FileUtils.mv(FILE_NAME, new_file_name)
end
end
fluentd is watching the file name. My config is:
<source>
@type tail
path /tmp/script.log
tag my-tag
pos_file /var/log/td-agent/script.pos
<parse>
@type json
time_key time
</parse>
</source>
Here are problems I faced:
- TailWatcher sometimes won't restart
- TailWatcher may not send the first part of rotated logs
I debugged them using mv and cp command.
TailWatcher won't restart
When I run mv, the file of the filename becomes none. Then, TailWatcher#@rotate_handler becomes nil. This stops to watch the inode.
https://github.com/fluent/fluentd/blob/v1.12.1/lib/fluent/plugin/in_tail.rb#L763
else # file is rotated and new file not found
# Clear RotateHandler to avoid duplicated file watch in same path.
@rotate_handler = nil
watcher_needs_update = true
end
if watcher_needs_update
# No need to update a watcher if stat is nil (file not present), because moving to inodes will create
# new watcher, and old watcher will be closed by stop_watcher in refresh_watchers method
if stat
target_info = TargetInfo.new(@path, stat.ino)
if @follow_inodes
# don't want to swap state because we need latest read offset in pos file even after rotate_wait
@update_watcher.call(target_info, @pe)
else
@update_watcher.call(target_info, swap_state(@pe))
end
end
The comment says "moving to inodes will create new watcher, and old watcher will be closed by stop_watcher in refresh_watchers method". However, this only happens if refresh_watchers() runs when there is no file in the path and refresh_watchers() runs later when a file is created in the path. If a file is mved and a new file is created between refresh_watchers()s, refresh_watchers() won't run start_watchers() and stop_watchers() for the path because target_paths_hash always contains the path.
TailWatcher may not send the first part of rotated logs
refresh_watchers() will create a new TailWatcher if a new file is created. refresh_watchers() runs every refresh_interval seconds, 60 seconds as default.
The problem is when a new file is created and logs are written to the file during refresh_interval, the new TimeWatcher won't send the written logs because it watch the tail of the new file which contains the written logs.
read_from_head true fixes this problem, but I'm not sure whether this is intentional fix in #3182. Even if it is intentional, it would be better to write a document about this behaviour.
Why fluentd 0.14.24 worked?
fluentd 0.14.24 creates a new TailWatcher soon even if a new file doesn't exist:
https://github.com/fluent/fluentd/blob/v0.14.24/lib/fluent/plugin/in_tail.rb#L584
if watcher_needs_update
@update_watcher.call(@path, swap_state(@pe))
else
It looks like this is the reason why old fluentd detected the rotation but new fluentd doesn't.
Adding FileUtils.touch(FILE_NAME) to my script sometimes fixes these problems. However, since mv and touch aren't atomic, StatWatcher can detect the timing that there is no file for the filename, IIUC. Even if enable_stat_watcher is false, there is rare timing TimerTrigger calls on_notify when there is no file for a file name.
Describe the bug
After upgrading fluentd from 0.14.24 to 1.12.1, tail plugin + my script which rotates the watched log using
mvsometimes won't watch the rotated log file and send the first part of rotated logs.To Reproduce
Watch
/tmp/script.logmv/tmp/script.logto/tmp/script.log.20210314and write to/tmp/script.logExpected behavior
The content in
/tmp/script.logshould be sent.Your Environment
Your Configuration
Your Error Log
none (No rotate detection log)
Additional context
I'm trying to upgrade fluentd from 0.14.24 to 1.12.1 and faced two problems.
My script is called by cron every day. At the start of the script, it
mvs log to rotate, do something great and dump logs to a new file.fluentd is watching the file name. My config is:
Here are problems I faced:
I debugged them using
mvandcpcommand.TailWatcher won't restart
When I run
mv, the file of the filename becomes none. Then,TailWatcher#@rotate_handlerbecomesnil. This stops to watch the inode.https://github.com/fluent/fluentd/blob/v1.12.1/lib/fluent/plugin/in_tail.rb#L763
The comment says "moving to inodes will create new watcher, and old watcher will be closed by stop_watcher in refresh_watchers method". However, this only happens if
refresh_watchers()runs when there is no file in the path andrefresh_watchers()runs later when a file is created in the path. If a file ismved and a new file is created betweenrefresh_watchers()s,refresh_watchers()won't runstart_watchers()andstop_watchers()for the path becausetarget_paths_hashalways contains the path.TailWatcher may not send the first part of rotated logs
refresh_watchers()will create a newTailWatcherif a new file is created.refresh_watchers()runs everyrefresh_intervalseconds, 60 seconds as default.The problem is when a new file is created and logs are written to the file during
refresh_interval, the newTimeWatcherwon't send the written logs because it watch the tail of the new file which contains the written logs.read_from_head truefixes this problem, but I'm not sure whether this is intentional fix in #3182. Even if it is intentional, it would be better to write a document about this behaviour.Why fluentd 0.14.24 worked?
fluentd 0.14.24 creates a new
TailWatchersoon even if a new file doesn't exist:https://github.com/fluent/fluentd/blob/v0.14.24/lib/fluent/plugin/in_tail.rb#L584
It looks like this is the reason why old fluentd detected the rotation but new fluentd doesn't.
Adding
FileUtils.touch(FILE_NAME)to my script sometimes fixes these problems. However, sincemvandtoucharen't atomic, StatWatcher can detect the timing that there is no file for the filename, IIUC. Even ifenable_stat_watcherisfalse, there is rare timingTimerTriggercallson_notifywhen there is no file for a file name.