Skip to content

'ignore_repeated_permission_error' tail flag does not behave as expected #3038

@pranavmarla

Description

@pranavmarla

Describe the bug
Let's say there is a file called symlink_1.log, which is a broken symlink (i.e. symlink pointing to log file which no longer exists). By default, if we tell Fluentd to tail symlink_1.log, it keeps generating the following error every minute:
2020-06-11 18:59:34 +0000 [warn]: #0 symlink_1.log unreadable. It is excluded and would be examined next time.

Since I cannot fix this broken symlink, and I do not want to see repeated 'unreadable' errors about this same broken symlink, I enable the 'ignore_repeated_permission_error' tail flag -- unfortunately, this flag does not behave as expected.

Expected behavior
Based on the name and description (It suppress repeated permission error logs) of the flag, this is what I expect it to do:
For symlink1.log, the FIRST time an 'unreadable' error is generated for this file, it will be printed -- all subsequent 'unreadable' errors for this particular file will not be printed.
Now, let's say there is another file called symlink2.log, which is also a broken symlink. Since this is a different file, again, the FIRST time an 'unreadable' error is generated for this file, it will be printed -- all subsequent 'unreadable' errors for this particular file will not be printed.

In other words, when this flag is enabled, we should only see ONE 'unreadable' error message per file -- i.e. if we have 2 broken symlinks, we should see 2 'unreadable' error messages. This way, we don't see endless errors about the same file, but do not miss out on errors for NEW files.

Actual, buggy behavior
Instead, when I enable this flag, this is what seems to happen:
Regardless of how many different files are causing errors, Fluentd only prints the very FIRST 'unreadable' error it gets -- it ignores all other 'unreadable' errors, even if they're being caused by a different file!
Eg. symlink1.log will only have 1 error message (which is expected), but all other files (symlink2.log) will have 0 messages! This is bad, since now I have no way of even knowing that there is a broken symlink called symlink2.log!

To Reproduce
First, to clarify, although the FAQ says that this error means that Fluentd does not have permission to read the file, that is not the case here. Here, Fluentd DOES have permission to tail the file -- my problem is that the file it is tailing is actually a broken symlink.

I noticed this when running the Fluentd Kubernetes Daemonset on a Kubernetes cluster. My cluster stores symlinks to certain system container logs in /var/log/sys -- I am using Fluentd to tail all the files in that folder.

Unfortunately, it looks like every time a new nginx-proxy container is created, a new nginx-proxy symlink is created in that folder, which means the old nginx-proxy symlink is now broken.
i.e. If you look inside /var/log/sys, you will see something like this:

nginx-proxy_277.log -> /var/lib/docker/containers/277/277-json.log
nginx-proxy_584.log -> /var/lib/docker/containers/584/584-json.log
nginx-proxy_a06.log -> /var/lib/docker/containers/a06/a06-json.log
nginx-proxy_f7b.log -> /var/lib/docker/containers/f7b/f7b-json.log
# This is the most recent file -- this is the only symlink here which is not broken!
nginx-proxy_ca2.log -> /var/lib/docker/containers/ca2/ca2-json.log

By default, every minute, Fluentd generates errors for these 4 broken symlinks:

2020-06-11 19:44:34 +0000 [warn]: #0 /var/log/sys/nginx-proxy_f7b.log unreadable. It is excluded and would be examined next time.
2020-06-11 19:44:34 +0000 [warn]: #0 /var/log/sys/nginx-proxy_277.log unreadable. It is excluded and would be examined next time.
2020-06-11 19:44:34 +0000 [warn]: #0 /var/log/sys/nginx-proxy_a06.log unreadable. It is excluded and would be examined next time.
2020-06-11 19:44:34 +0000 [warn]: #0 /var/log/sys/nginx-proxy_584.log unreadable. It is excluded and would be examined next time.

2020-06-11 19:45:34 +0000 [warn]: #0 /var/log/sys/nginx-proxy_f7b.log unreadable. It is excluded and would be examined next time.
2020-06-11 19:45:34 +0000 [warn]: #0 /var/log/sys/nginx-proxy_277.log unreadable. It is excluded and would be examined next time.
2020-06-11 19:45:34 +0000 [warn]: #0 /var/log/sys/nginx-proxy_a06.log unreadable. It is excluded and would be examined next time.
2020-06-11 19:45:34 +0000 [warn]: #0 /var/log/sys/nginx-proxy_584.log unreadable. It is excluded and would be examined next time.

...

I cannot manually fix/remove these broken symlinks and I do not want error messages about these 4 files repeated endlessly in Fluentd's logs, since that makes it very hard to read and catch other issues in Fluentd's logs.
So, when I enabled the ignore_repeated_permission_error flag, I was hoping that it would just print 4 error messages (1 per broken symlink), but no more errors after that for those same files.
Tomorrow, if there is a new 5th symlink which is also broken, then again I want the first error message for that new file to be printed, but no more errors after that for that same file.

Instead, I get just 1 error message, but nothing else after that! Thus, tomorrow if there is a new 5th symlink which is broken, I will never know about it because there will be no error message!
Eg.

2020-06-11 19:44:34 +0000 [warn]: #0 /var/log/sys/nginx-proxy_f7b.log unreadable. It is excluded and would be examined next time.

# No more 'unreadable' error messages, for any file -- I am now blind to future issues of this type!!

Your Environment

  • Fluentd Kubernetes Daemonset version: v1.10.4-debian-forward-1.0

Your Configuration

<source>
  @type tail
  @id logs

  tag kubernetes.*

  path /var/log/sys/*.log, /var/log/containers/*_cattle-system*.log, ...
  pos_file /var/log/fluentd-container.log.pos

  <parse>
    @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
    time_format %Y-%m-%dT%H:%M:%S.%NZ
  </parse>
  
  path_key log_file
  read_from_head true
  pos_file_compaction_interval 24h
  rotate_wait 5s

  enable_watch_timer true
  enable_stat_watcher false

  ignore_repeated_permission_error true
</source>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions