Skip to content

in_tail may not send rotated logs when mv is used to rotate #3292

@aYukiSekiguchi

Description

@aYukiSekiguchi

Describe the bug
After upgrading fluentd from 0.14.24 to 1.12.1, tail plugin + my script which rotates the watched log using mv sometimes won't watch the rotated log file and send the first part of rotated logs.

To Reproduce

Watch /tmp/script.log

<source>
  @type tail
  path /tmp/script.log
  tag my-tag
  pos_file /var/log/td-agent/script.pos
  <parse>
    @type json
    time_key time
  </parse>
</source>

mv /tmp/script.log to /tmp/script.log.20210314 and write to /tmp/script.log

Expected behavior
The content in /tmp/script.log should be sent.

Your Environment

  • Fluentd or td-agent version: td-agent 4.1.0 fluentd 1.12.1 (e3effa3)
  • Operating system: Amazon Linux 2
  • Kernel version: 4.14.192-147.314.amzn2.x86_64

Your Configuration

<source>
  @type tail
  path /tmp/script.log
  tag my-tag
  pos_file /var/log/td-agent/script.pos
  <parse>
    @type json
    time_key time
  </parse>
</source>

Your Error Log

none (No rotate detection log)

Additional context

I'm trying to upgrade fluentd from 0.14.24 to 1.12.1 and faced two problems.

My script is called by cron every day. At the start of the script, it mvs log to rotate, do something great and dump logs to a new file.

FILE_NAME = '/tmp/script.log'

def rotate_file_if_exist
  if File.exists?(FILE_NAME)
    new_file_name = "#{FILE_NAME}.#{Time.now.strftime('%Y%m%d')}"
    FileUtils.mv(FILE_NAME, new_file_name)
  end
end

fluentd is watching the file name. My config is:

<source>
  @type tail
  path /tmp/script.log
  tag my-tag
  pos_file /var/log/td-agent/script.pos
  <parse>
    @type json
    time_key time
  </parse>
</source>

Here are problems I faced:

  1. TailWatcher sometimes won't restart
  2. TailWatcher may not send the first part of rotated logs

I debugged them using mv and cp command.

TailWatcher won't restart

When I run mv, the file of the filename becomes none. Then, TailWatcher#@rotate_handler becomes nil. This stops to watch the inode.

https://github.com/fluent/fluentd/blob/v1.12.1/lib/fluent/plugin/in_tail.rb#L763

          else # file is rotated and new file not found
            # Clear RotateHandler to avoid duplicated file watch in same path.
            @rotate_handler = nil
            watcher_needs_update = true
          end

          if watcher_needs_update
            # No need to update a watcher if stat is nil (file not present), because moving to inodes will create
            # new watcher, and old watcher will be closed by stop_watcher in refresh_watchers method
            if stat
              target_info = TargetInfo.new(@path, stat.ino)
              if @follow_inodes
                # don't want to swap state because we need latest read offset in pos file even after rotate_wait
                @update_watcher.call(target_info, @pe)
              else
                @update_watcher.call(target_info, swap_state(@pe))
              end
            end

The comment says "moving to inodes will create new watcher, and old watcher will be closed by stop_watcher in refresh_watchers method". However, this only happens if refresh_watchers() runs when there is no file in the path and refresh_watchers() runs later when a file is created in the path. If a file is mved and a new file is created between refresh_watchers()s, refresh_watchers() won't run start_watchers() and stop_watchers() for the path because target_paths_hash always contains the path.

TailWatcher may not send the first part of rotated logs

refresh_watchers() will create a new TailWatcher if a new file is created. refresh_watchers() runs every refresh_interval seconds, 60 seconds as default.

The problem is when a new file is created and logs are written to the file during refresh_interval, the new TimeWatcher won't send the written logs because it watch the tail of the new file which contains the written logs.

read_from_head true fixes this problem, but I'm not sure whether this is intentional fix in #3182. Even if it is intentional, it would be better to write a document about this behaviour.

Why fluentd 0.14.24 worked?

fluentd 0.14.24 creates a new TailWatcher soon even if a new file doesn't exist:
https://github.com/fluent/fluentd/blob/v0.14.24/lib/fluent/plugin/in_tail.rb#L584

          if watcher_needs_update
            @update_watcher.call(@path, swap_state(@pe))
          else

It looks like this is the reason why old fluentd detected the rotation but new fluentd doesn't.

Adding FileUtils.touch(FILE_NAME) to my script sometimes fixes these problems. However, since mv and touch aren't atomic, StatWatcher can detect the timing that there is no file for the filename, IIUC. Even if enable_stat_watcher is false, there is rare timing TimerTrigger calls on_notify when there is no file for a file name.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions