Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.
Describe the bug
Duplicating Logs were occurred when td-agent is stoped.
We consider that the buffer files which is created by flush_at_shutdown true within out forward plugin has not deleted even if the td-agent is stopped.
Thus, it seems to be caused log duplicating by retransmitting when td-agent starts.
To Reproduce
- Prepare 2 hosts; one is a sender and the other is a receiver.
- We intended to be aggregated to the receiver the nginx access logs which the sender has.
-
- The log has a unique ID that the receiver is able to check for duplicates.
-
- The outbound logs toward the receiver is 1 per sec.
- When the logs are fighting, the sender's td-agent process is stop.
- After 5 sec from the stopping the process, we restarted the sender's td-agent process to continue aggregation.
- When we confirm the logs whether duplicating or not on the receiver.
- Against expectations, there are some couple of logs which have a same unique ID.
With v0.12, we never encountered data duplications above the procedure, Although updating 1.11.1 , we always encountered it.
Reloading the td-agent also indicates to cause the same behavior.
Expected behavior
I suppose the buffer file can be deleted if its contents has been already sent.
As flush_at_shutdown is true, the remaining data in the buffer file should have been already sent to the receiver before td-agent process stops.
Your Environment
- Fluentd or td-agent version: td-agent 1.11.1
- Operating system: Ubuntu 18.04.2 LTS (Bionic Beaver)
- Kernel version: 4.15.0-55-generic
Your Configuration
*** for the host of sender side.
<source>
@type tail
@id input_tail
<parse>
@type nginx
</parse>
path /var/log/nginx/access.log
tag td.nginx.access
</source>
<match td.*.*>
@type copy
@id output_copy
<store>
@type forward
require_ack_response true
transport tcp
heartbeat_interval 5
phi_threshold 5
send_timeout 60
hard_timeout 60
<buffer>
@type file
path /var/log/td-agent/buffer/td
chunk_limit_size 16m
queue_limt_length 512
flush_thread_count 4
flush_interval 1s
retry_max_times 60
retry_max_interval 10s
flush_at_shutdown true
</buffer>
<server>
host 192.168.86.102
port 8889
</server>
</store>
<store>
@type file
path /var/log/td-agent/td-%Y-%m-%d/%H.log
</store>
</match>
for the host of receiver side.
<source>
@type forward
bind 0.0.0.0
port 8889
tag wow
</source>
<match wow.**>
@type copy
<store>
type file
path /var/log/td-agent/out
time_slice_format %Y%m%d%H
</store>
</match>
Your Error Log
sender side : after stopping td-agent.
this buffer file contains a string of unix time "1599639024".
vagrant@ubuntu-bionic:/var/log/td-agent/buffer/td$ sudo service td-agent start
vagrant@ubuntu-bionic:/var/log/td-agent/buffer/td$ sudo service td-agent stop
vagrant@ubuntu-bionic:/var/log/td-agent/buffer/td$ for _f in `ls -1 *.log*`; do echo "=====";echo $_f; echo "";cat $_f; echo ""; done
=====
buffer.q5aedcfbd340ec9df1785594f51dd83c5.log
....
/?1599639024
curl/7.58.0
...
=====
buffer.q5aedcfbd340ec9df1785594f51dd83c5.log.meta
...
vagrant@ubuntu-bionic:/var/log/td-agent/buffer/td$
receiver side : before stopping td-agent on the sender side.
The data with '1599639024' have been already received.
vagrant@ubuntu-bionic:/var/log/td-agent/out$ for _FILE in `ls -1 *.log`; do echo $_FILE;ls -l $_FILE;echo "";echo "tail =====";tail -n 3 $_FILE;echo "";echo "diff =====";diff <(cat $_FILE) <(cat $_FILE|uniq) ; done
buffer.b5aedceb6aa7801a36be087ddfb6fe294.log
-rw-r--r-- 1 td-agent td-agent 2784 Sep 9 08:10 buffer.b5aedceb6aa7801a36be087ddfb6fe294.log
tail =====
2020-09-09T08:10:22+00:00 wow {"remote":"127.0.0.1","host":"-","user":"-","method":"GET","path":"/?1599639022","code":"200","size":"612","referer":"-","agent":"curl/7.58.0"}
2020-09-09T08:10:23+00:00 wow {"remote":"127.0.0.1","host":"-","user":"-","method":"GET","path":"/?1599639023","code":"200","size":"612","referer":"-","agent":"curl/7.58.0"}
2020-09-09T08:10:24+00:00 wow {"remote":"127.0.0.1","host":"-","user":"-","method":"GET","path":"/?1599639024","code":"200","size":"612","referer":"-","agent":"curl/7.58.0"}
diff =====
vagrant@ubuntu-bionic:/var/log/td-agent/out$
receiver side : after stopping td-agent on the sender side.
the data with '1599639024' found twice.
vagrant@ubuntu-bionic:/var/log/td-agent/out$ for _FILE in `ls -1 *.log`; do echo $_FILE;ls -l $_FILE;echo "";echo "tail =====";tail -n 3 $_FILE;echo "";echo "diff =====";diff <(cat $_FILE) <(cat $_FILE|uniq) ; done
buffer.b5aedceb6aa7801a36be087ddfb6fe294.log
-rw-r--r-- 1 td-agent td-agent 4176 Sep 9 08:15 buffer.b5aedceb6aa7801a36be087ddfb6fe294.log
tail =====
2020-09-09T08:15:32+00:00 wow {"remote":"127.0.0.1","host":"-","user":"-","method":"GET","path":"/?1599639332","code":"200","size":"612","referer":"-","agent":"curl/7.58.0"}
2020-09-09T08:15:33+00:00 wow {"remote":"127.0.0.1","host":"-","user":"-","method":"GET","path":"/?1599639333","code":"200","size":"612","referer":"-","agent":"curl/7.58.0"}
2020-09-09T08:15:34+00:00 wow {"remote":"127.0.0.1","host":"-","user":"-","method":"GET","path":"/?1599639334","code":"200","size":"612","referer":"-","agent":"curl/7.58.0"}
diff =====
17d16
< 2020-09-09T08:10:24+00:00 wow {"remote":"127.0.0.1","host":"-","user":"-","method":"GET","path":"/?1599639024","code":"200","size":"612","referer":"-","agent":"curl/7.58.0"}
vagrant@ubuntu-bionic:/var/log/td-agent/out$
Additional context
Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.
Describe the bug
Duplicating Logs were occurred when td-agent is stoped.
We consider that the buffer files which is created by
flush_at_shutdown truewithinout forward pluginhas not deleted even if the td-agent is stopped.Thus, it seems to be caused log duplicating by retransmitting when td-agent starts.
To Reproduce
With v0.12, we never encountered data duplications above the procedure, Although updating 1.11.1 , we always encountered it.
Reloading the td-agent also indicates to cause the same behavior.
Expected behavior
I suppose the buffer file can be deleted if its contents has been already sent.
As
flush_at_shutdownis true, the remaining data in the buffer file should have been already sent to the receiver before td-agent process stops.Your Environment
Your Configuration
*** for the host of sender side.
for the host of receiver side.
Your Error Log
sender side : after stopping td-agent.
this buffer file contains a string of unix time "1599639024".
receiver side : before stopping td-agent on the sender side.
The data with '1599639024' have been already received.
receiver side : after stopping td-agent on the sender side.
the data with '1599639024' found twice.
Additional context