Skip to content

Add follow_inodes flag to tail files by inodes, not by names#1660

Closed
OleksiiDuzhyi wants to merge 6 commits into
fluent:masterfrom
wix-playground:master
Closed

Add follow_inodes flag to tail files by inodes, not by names#1660
OleksiiDuzhyi wants to merge 6 commits into
fluent:masterfrom
wix-playground:master

Conversation

@OleksiiDuzhyi

Copy link
Copy Markdown

Use case

We have a lot of servers with up to 300 application deployed on each server (mostly Java/Scala). Each application writes log files using log4j library. Each log file is rotated daily in the same directory, with up to 10 back up files possible. Events are fed to realtime processing system(critical to time delays up to 1 second) and to HBase. We want to replace current heavy log collector with lightweight solution.

Problem

We tried to use default tail plugin. Most basic configuration will look like this:

<source>
  @type tail
  read_from_head true
  path /var/log/*/app*.log  
  tag app.log.*
  format none
  refresh_interval 10
  pos_file /var/log/pos/app.log.pos
  rotate_wait 3600
  path_key path
  limit_recently_modified 86400
</source>

According to fluentd in_tail plugin documentation - every time file is rotated for such configuration, it will be treated by in_tail plugin as new one, and will be read from it's head, even though file inode has not changed, and in position file we have latest read position (but just for last line before rotation; during rotate_wait time position stored in memory only - https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin/in_tail.rb#L554)

This doesn't suit our needs, because:

  • we will generate a lot of unneeded traffic, which will be irrelevant to realtime part
  • disk I/O increase
  • longer ingestion into HBase

One of the possible solution proposed by @repeatedly in #1613 is to avoid '*' in file pattern discovery. This also doesn't suit us, because we'd like to reliably store events with no or minimum loss because they are very crucial for business analysis. Possible data loss scenario:

  1. Fluentd goes down at 23:45 UTC
  2. Log file is rotated at 00:00:00 UTC, app.log is renamed to app.2017-08-02.log (just for example)
  3. Fluentd goes up at 00:15 UTC
  4. Tail plugin reads all data from in new app.log file
  5. Data from app.2017-08-02.log file from 23:45 - 00:00 UTC is lost. No way for us to find out at least latest offset in app.2017-08-02.log because it has been overwritten in position file.

Proposed solution

Add ability to track inodes inside fluentd in_tail plugin to enable combination of '*' in path configuration with rotation inside same directory and read_from_head set to true. In proposed solution we add follow_inodes flag (set to false by default to preserve current behavior). When follow_inodes is set to true and file rotation is detected - we do not reread file, instead we are checking if such inode is present inside pos file - if yes, we don't recreate TailWatcher. TailWatcher is closed after rotate_wait period and file marked as unwatched inside position file after limit_recently_modified time passed. Pos file is cleaned on startup as it was before, but also delete entries for non-existing files.

Please review, and let me know what do you think of such approach?

@repeatedly

Copy link
Copy Markdown
Member

Thanks for the patch!
I will check it later.

@repeatedly repeatedly added feature request *Deprecated Label* Use enhancement label in general v0.14 labels Aug 15, 2017
Comment thread lib/fluent/plugin/in_tail.rb Outdated
}
excluded = @exclude_path.map { |path| path = date.strftime(path); path.include?('*') ? Dir.glob(path) : path }.flatten.uniq
paths - excluded
excluded = @exclude_path.map {|path| path = date.strftime(path); path.include?('*') ? Dir.glob(path) : path}.flatten.uniq

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change syntax/indent.

Comment thread lib/fluent/plugin/in_tail.rb Outdated
excluded = @exclude_path.map {|path| path = date.strftime(path); path.include?('*') ? Dir.glob(path) : path}.flatten.uniq
to_return = paths - excluded
# filter out non existing files, so in case pattern is without '*' we don't do unnecessary work
to_return = to_return.select {|path|

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(paths - excluded).to_return.select {|path| seems enough

Comment thread lib/fluent/plugin/in_tail.rb Outdated
to_return = paths - excluded
# filter out non existing files, so in case pattern is without '*' we don't do unnecessary work
to_return = to_return.select {|path|
Pathname.new(path).exist?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileTest#exist? is more lightweight because no object instantiation.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thx, didn't know about that option

Comment thread lib/fluent/plugin/in_tail.rb Outdated
to_return = to_return.select {|path|
Pathname.new(path).exist?
}
Hash[to_return.map {|path|

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long lines in Hash[] seems hard to read.
This map can be moved to above line. {}.select {}.map {}.

Comment thread lib/fluent/plugin/in_tail.rb Outdated
end

def existence_path
Hash[@tails.keys.map {|path_ino|

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With lots of files, e.g. 1000+ files, this map creates lots of object.
So avoiding map is better for the performance.

Of course, expand_paths's map is also same.

Comment thread lib/fluent/plugin/in_tail.rb Outdated
end
end

class PathInodeTuple

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check eql? and other method behaviour of struct object but can't use struct instead of class for this case?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked eql? of struct - it work as expected, will change to Struct

@repeatedly

Copy link
Copy Markdown
Member

BTW, all most tests are failed.
So the something makes tests unstable.

@OleksiiDuzhyi

Copy link
Copy Markdown
Author

I can't figure why tests are failing, maybe you could give me a hand in this?

In my test I do the following:

File.open("#{TMP_DIR}/tail.txt", "wb") {|f|
    f.puts "test1"
    f.puts "test2"
}
............
cleanup_file("#{TMP_DIR}/tail.txt")
File.open("#{TMP_DIR}/tail.txt", "wb") {|f|

cleanup_file code:

  def cleanup_file(path)
    FileUtils.rm_f(path, secure: true)
    if File.exist?(path)
      # ensure files are closed for Windows, on which deleted files
      # are still visible from filesystem
      GC.start(full_mark: true, immediate_mark: true, immediate_sweep: true)
      FileUtils.remove_entry_secure(path, true)
    end
  end

But for some reason I can't open a file for writing after I've deleted it. Even though I was able to write this file few lines above. I got this exception:

Errno::EACCES: Permission denied @ rb_sysopen - C:/projects/fluentd/test/plugin/../tmp/tail/tail.txt

I have no clue why I can't reopen file when I've deleted it. Should I remove file in a different way?

@esselius

Copy link
Copy Markdown

Will this be merged anytime soon?

@repeatedly

Copy link
Copy Markdown
Member

@esselius in_tail plugin is widely used so we want to test this patch on several environment before merging.

@OleksiiDuzhyi

Copy link
Copy Markdown
Author

We decided to move away from fluentd for log collection, also relying on inodes - is not 100% reliable, inodes could be reused which could cause issues

@repeatedly

Copy link
Copy Markdown
Member

@OleksiiDuzhyi

We decided to move away from fluentd for log collection

What the tool do you use?
And how to solve rotation issue with your tools?

@OleksiiDuzhyi

Copy link
Copy Markdown
Author

@repeatedly
We decided to go with Apache Flume. From our tests - it shows bigger memory footprint but 2-3 times less CPU usage, which was critical for us.

As input to Flume we wrote a program in C which does what I tried to implement in this PR - tracks inodes, keep state of inode, etc.

@repeatedly

Copy link
Copy Markdown
Member

@OleksiiDuzhyi Thanks! So inode issue is still remained but flume resolves CPU concern.
Hmm... we hope fluent-bit resovles such case with lower CPU and memory resource.

@OleksiiDuzhyi

Copy link
Copy Markdown
Author

@repeatedly
Memory usage of fluentd is small as in documentation - in our case for around 1 mln events/min - it was around 80 mb.

@metayd

metayd commented Apr 4, 2018

Copy link
Copy Markdown

@repeatedly Hello, I have this problem too, is there any plan to merge this patch into master?

@sanjuicgithub

Copy link
Copy Markdown

@repeatedly , is this patch still in consideration or does fluentd already has any such feature to address this issue ?

@repeatedly

repeatedly commented Apr 17, 2020

Copy link
Copy Markdown
Member

is this patch still in consideration

Current patch is incomplete because using only inode has a problem for inode reuse case.
So inode and path or something is needed for more reliable file trace.
If we can ignore such situation, we can accept this patch approach.

kenhys added a commit to kenhys/fluentd that referenced this pull request Sep 29, 2020
Based on closed PR fluent#1660 which
is posted by OleksiiDuzhyi because of incomplete test cases.
cosmo0920 pushed a commit to cosmo0920/fluentd that referenced this pull request Nov 11, 2020
Based on closed PR fluent#1660 which
is posted by OleksiiDuzhyi because of incomplete test cases.
cosmo0920 pushed a commit to cosmo0920/fluentd that referenced this pull request Nov 13, 2020
Based on closed PR fluent#1660 which
is posted by OleksiiDuzhyi because of incomplete test cases.
cosmo0920 pushed a commit to cosmo0920/fluentd that referenced this pull request Nov 17, 2020
Based on closed PR fluent#1660 which
is posted by OleksiiDuzhyi because of incomplete test cases.
cosmo0920 pushed a commit to cosmo0920/fluentd that referenced this pull request Nov 20, 2020
Based on closed PR fluent#1660 which
is posted by OleksiiDuzhyi because of incomplete test cases.
cosmo0920 pushed a commit to cosmo0920/fluentd that referenced this pull request Nov 24, 2020
Based on closed PR fluent#1660 which
is posted by OleksiiDuzhyi because of incomplete test cases.
cosmo0920 pushed a commit to cosmo0920/fluentd that referenced this pull request Nov 24, 2020
Based on closed PR fluent#1660 which
is posted by OleksiiDuzhyi because of incomplete test cases.
cosmo0920 pushed a commit that referenced this pull request Nov 25, 2020
Based on closed PR #1660 which
is posted by OleksiiDuzhyi because of incomplete test cases.

Co-authored-by: alexeyd <alexeyd@wix.com>

Signed-off-by: Hiroshi Hatake <hatake@clear-code.com>
cosmo0920 pushed a commit that referenced this pull request Nov 25, 2020
Based on closed PR #1660 which
is posted by OleksiiDuzhyi because of incomplete test cases.

Co-authored-by: alexeyd <alexeyd@wix.com>

Signed-off-by: Hiroshi Hatake <hatake@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request *Deprecated Label* Use enhancement label in general v0.14

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants