Kubernetes container logs - in_tail lose some of rotated logs when rotation is quite fast 

**Describe the bug**

We were doing stress test on our EFK stack. We deploy a simple openresty web server that writes access logs to stdout, and use vetega tool to send HTTP requests at a constant rate (QPS: 10000 for 180 seconds). We find that when log file is rotated quite fast, it takes a long time (around 2 mins) for in_tail to discover the new log file, and in_tail loses some of the rotated files.

**To Reproduce**

Docker logging driver config:
```
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "50",
    "compress": "true"
  },
  "storage-driver": "overlay2"
}
```
```
# echo "GET http://172.16.151.239" | vegeta attack -rate=10000 -duration=180s | tee results.bin | vegeta report
Requests      [total, rate, throughput]         1799999, 9999.95, 9999.94
Duration      [total, attack, wait]             3m0s, 3m0s, 324.017µs
Latencies     [min, mean, 50, 90, 95, 99, max]  107.842µs, 904.156µs, 306.604µs, 1.96ms, 3.064ms, 8.051ms, 114.653ms
Bytes In      [total, mean]                     0, 0.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      204:1799999
Error Set:
```

**Expected behavior**
We expect Kibana to also receive 1799999 logs. However, it only receives 622756 entires.
![Screen Shot 2021-06-24 at 10 36 25 PM](https://user-images.githubusercontent.com/15389816/123288361-144bc580-d542-11eb-9da9-7cc51e2423d0.png)


**Your Environment**

- Fluentd or td-agent version: `v1.12.0`
- Operating system: `Debian GNU/Linux 10 (buster)`
- Kernel version: `5.4.0-48-generic`

**Your Configuration**

```
<source>
        @type tail
        @log_level debug
        @id in_tail_container_logs
        path /var/log/containers/*.log
        pos_file /var/log/fluentd-containers.log.pos
        tag kubernetes.*
        exclude_path ["/var/log/containers/*_kube-system_*.log", "/var/log/containers/*_default_*.log", "/var/log/containers/*_rook-ceph_*.log", "/var/log/contain
ers/*fluent*.log"]
        read_from_head true
        follow_inodes true
        <parse>
          @type json
          time_format "%Y-%m-%dT%H:%M:%S.%NZ"
          keep_time_key true
        </parse>
      </source>
 
      <match kubernetes.** >
          @type elasticsearch_data_stream
          data_stream_name logs-efk-container
          @log_level debug
          hosts https://elasticsearch-client.default.svc.cluster.local:9200
          user elastic
          password xxxxxx
          ssl_verify false
          reload_connections false
          reconnect_on_error true
          reload_on_failure true
          log_es_400_reason false
          request_timeout 1000s
          include_timestamp true
          <buffer>
            flush_thread_count 8
            flush_interval 1s
            retry_max_interval 5
            retry_forever true
          </buffer>
      </match>
```

**Additional context**
Docker logs
```
# wc -l *.log*
    34143 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log
   588621 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log.1
   588621 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log.2
   588620 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log.3
  1800005 total
```
Get the last log at 22:24. It is at line number 588614 of the log file - 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log.3
![Screen Shot 2021-06-24 at 10 30 52 PM](https://user-images.githubusercontent.com/15389816/123288961-989e4880-d542-11eb-8b90-f5cfb8cbb617.png)
```
# grep '2021-06-24T14:24:00.849102284Z' 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log.3 -n
588614:{"log":"172.24.0.214 - - [24/Jun/2021:22:24:00 +0800] \"GET / HTTP/1.1\" 204 0 \"-\" \"Go-http-client/1.1\"\n","stream":"stdout","time":"2021-06-24T14:24:00.849102284Z"}
```
Get the first log at 22:28. It is at the first line of the log file - 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log
![Screen Shot 2021-06-24 at 10 50 45 PM](https://user-images.githubusercontent.com/15389816/123289065-b2d82680-d542-11eb-9eaa-43eed21693a6.png)

```
# grep '2021-06-24T14:25:58.573922367Z' 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log -n
1:{"log":"172.24.0.214 - - [24/Jun/2021:22:25:58 +0800] \"GET / HTTP/1.1\" 204 0 \"-\" \"Go-http-client/1.1\"\n","stream":"stdout","time":"2021-06-24T14:25:58.573922367Z"}
```
It proofs that in_tail fails to check below 2 files when log rotation happening quite fast.
```
8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log.1.gz
8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log.2.gz
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes container logs - in_tail lose some of rotated logs when rotation is quite fast #3434

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Kubernetes container logs - in_tail lose some of rotated logs when rotation is quite fast #3434

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions