Skip to content

Duplicating Logs were occurred when td-agent is stoped. #3137

@takeo-hidaka

Description

@takeo-hidaka

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Describe the bug

Duplicating Logs were occurred when td-agent is stoped.

We consider that the buffer files which is created by flush_at_shutdown true within out forward plugin has not deleted even if the td-agent is stopped.
Thus, it seems to be caused log duplicating by retransmitting when td-agent starts.

To Reproduce

  • Prepare 2 hosts; one is a sender and the other is a receiver.
  • We intended to be aggregated to the receiver the nginx access logs which the sender has.
    • The log has a unique ID that the receiver is able to check for duplicates.
    • The outbound logs toward the receiver is 1 per sec.
  • When the logs are fighting, the sender's td-agent process is stop.
  • After 5 sec from the stopping the process, we restarted the sender's td-agent process to continue aggregation.
  • When we confirm the logs whether duplicating or not on the receiver.
  • Against expectations, there are some couple of logs which have a same unique ID.

With v0.12, we never encountered data duplications above the procedure, Although updating 1.11.1 , we always encountered it.
Reloading the td-agent also indicates to cause the same behavior.

Expected behavior

I suppose the buffer file can be deleted if its contents has been already sent.
As flush_at_shutdown is true, the remaining data in the buffer file should have been already sent to the receiver before td-agent process stops.

Your Environment

  • Fluentd or td-agent version: td-agent 1.11.1
  • Operating system: Ubuntu 18.04.2 LTS (Bionic Beaver)
  • Kernel version: 4.15.0-55-generic

Your Configuration

*** for the host of sender side.

<source>
  @type tail
  @id input_tail
  <parse>
    @type nginx
  </parse>
  path /var/log/nginx/access.log
  tag td.nginx.access
</source>

<match td.*.*>
  @type copy
  @id output_copy
  <store>
    @type forward
    require_ack_response true
    transport tcp
    heartbeat_interval 5
    phi_threshold 5
    send_timeout 60
    hard_timeout 60
    <buffer>
      @type file
      path /var/log/td-agent/buffer/td
      chunk_limit_size 16m
      queue_limt_length 512
      flush_thread_count 4
      flush_interval 1s
      retry_max_times 60
      retry_max_interval 10s
      flush_at_shutdown true
    </buffer>
    <server>
       host 192.168.86.102
       port 8889
    </server>
  </store>
  <store>
    @type file
    path /var/log/td-agent/td-%Y-%m-%d/%H.log
  </store>
</match>

for the host of receiver side.

<source>
  @type forward
  bind 0.0.0.0
  port 8889
  tag wow
</source>

<match wow.**>
  @type copy
  <store>
    type file
    path /var/log/td-agent/out
    time_slice_format %Y%m%d%H
  </store>
</match>

Your Error Log

sender side : after stopping td-agent.

this buffer file contains a string of unix time "1599639024".

vagrant@ubuntu-bionic:/var/log/td-agent/buffer/td$ sudo service td-agent start
vagrant@ubuntu-bionic:/var/log/td-agent/buffer/td$ sudo service td-agent stop
vagrant@ubuntu-bionic:/var/log/td-agent/buffer/td$ for _f in `ls -1 *.log*`; do echo "=====";echo $_f; echo "";cat $_f; echo ""; done
=====
buffer.q5aedcfbd340ec9df1785594f51dd83c5.log

....
/?1599639024
curl/7.58.0
...

=====
buffer.q5aedcfbd340ec9df1785594f51dd83c5.log.meta

... 
vagrant@ubuntu-bionic:/var/log/td-agent/buffer/td$

receiver side : before stopping td-agent on the sender side.

The data with '1599639024' have been already received.

vagrant@ubuntu-bionic:/var/log/td-agent/out$ for _FILE in `ls -1 *.log`; do echo $_FILE;ls -l $_FILE;echo "";echo "tail =====";tail -n 3 $_FILE;echo "";echo "diff =====";diff <(cat $_FILE) <(cat $_FILE|uniq)  ; done
buffer.b5aedceb6aa7801a36be087ddfb6fe294.log
-rw-r--r-- 1 td-agent td-agent 2784 Sep  9 08:10 buffer.b5aedceb6aa7801a36be087ddfb6fe294.log

tail =====
2020-09-09T08:10:22+00:00    wow    {"remote":"127.0.0.1","host":"-","user":"-","method":"GET","path":"/?1599639022","code":"200","size":"612","referer":"-","agent":"curl/7.58.0"}
2020-09-09T08:10:23+00:00    wow    {"remote":"127.0.0.1","host":"-","user":"-","method":"GET","path":"/?1599639023","code":"200","size":"612","referer":"-","agent":"curl/7.58.0"}
2020-09-09T08:10:24+00:00    wow    {"remote":"127.0.0.1","host":"-","user":"-","method":"GET","path":"/?1599639024","code":"200","size":"612","referer":"-","agent":"curl/7.58.0"}

diff =====
vagrant@ubuntu-bionic:/var/log/td-agent/out$

receiver side : after stopping td-agent on the sender side.

the data with '1599639024' found twice.

vagrant@ubuntu-bionic:/var/log/td-agent/out$ for _FILE in `ls -1 *.log`; do echo $_FILE;ls -l $_FILE;echo "";echo "tail =====";tail -n 3 $_FILE;echo "";echo "diff =====";diff <(cat $_FILE) <(cat $_FILE|uniq)  ; done
buffer.b5aedceb6aa7801a36be087ddfb6fe294.log
-rw-r--r-- 1 td-agent td-agent 4176 Sep  9 08:15 buffer.b5aedceb6aa7801a36be087ddfb6fe294.log

tail =====
2020-09-09T08:15:32+00:00    wow    {"remote":"127.0.0.1","host":"-","user":"-","method":"GET","path":"/?1599639332","code":"200","size":"612","referer":"-","agent":"curl/7.58.0"}
2020-09-09T08:15:33+00:00    wow    {"remote":"127.0.0.1","host":"-","user":"-","method":"GET","path":"/?1599639333","code":"200","size":"612","referer":"-","agent":"curl/7.58.0"}
2020-09-09T08:15:34+00:00    wow    {"remote":"127.0.0.1","host":"-","user":"-","method":"GET","path":"/?1599639334","code":"200","size":"612","referer":"-","agent":"curl/7.58.0"}

diff =====
17d16
< 2020-09-09T08:10:24+00:00    wow    {"remote":"127.0.0.1","host":"-","user":"-","method":"GET","path":"/?1599639024","code":"200","size":"612","referer":"-","agent":"curl/7.58.0"}
vagrant@ubuntu-bionic:/var/log/td-agent/out$

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions