Describe the bug
I'm using Fluentd version 1.11.2 in EKS in Windows VMs. Sometimes pods get stuck in "terminating" status for hours and days, until I restart the VM. I checked the kubelet and the docker logs, and it seems like a log file which refuses to delete. Once I delete the Fluentd pod, the terminating pod succeed to delete. In my opinion, the root cause is the Fluentd which lock the file.
Here is a thread HERE which describes the same problem in Fluent-bit, and the last comment says it solved in the latest version.
To Reproduce
I didn't succeed to reproduce the bug, it just happen to me once in few days.
Expected behavior
I expect to have some fix in Fluentd like they did in Fluent-bit, which fixes the lock on files.
Your Environment
- Fluentd version: 1.11.2
- Operating system: Windows server 2019
Your Configuration
<source>
@type tail
@log_level info
path /var/log/containers/*.log
exclude_path ["/var/log/containers/log-tailer-*", "/var/log/containers/geneva-logger-*"]
pos_file /var/log/fluentd-docker.pos
tag kubernetes.*
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S
read_from_head true
</source>
# Don't care about fluentd logs
<match **fluentd**.log>
@type null
</match>
<filter kubernetes.var.log.containers.**>
@type kubernetes_metadata
</filter>
<match kubernetes.var.log.containers.**>
@type rewrite_tag_filter
# <rule>
# key log
# pattern .+
# tag kubernetes.prod
# </rule>
<rule>
key $['kubernetes']['namespace_name']
pattern ^(.+)$
tag kubernetes.$1
</rule>
</match>
<match kubernetes.**>
@type forward
require_ack_response true
expire_dns_cache 300
<buffer>
@type file
path /var/log/td-agent/buffer/kubernetes.{{ .Values.geneva.chartNamespace }}
chunk_limit_size 4m
queued_chunks_limit_size 4096
flush_interval 10s
flush_thread_count 8
retry_max_times 8
retry_timeout 5m
</buffer>
<server>
host {{ printf "%s.%s.svc.cluster.local" .Values.geneva.service.name .Values.geneva.chartNamespace }}
port {{ .Values.geneva.fluentd.port }}
</server>
{{- if .Values.certificate }}
transport tls
tls_cert_path C:\tmp\fluentd\secrets\fluentd-cert.pem
tls_allow_self_signed_cert true
tls_version TLSv1_2
{{- end }}
</match>
E0325 00:32:14.667270 4400 remote_runtime.go:261] RemoveContainer "cdce17ef7168b13b58d9409524324d0067f46a554979cfea0db7f6a2fcc0627d" from runtime service failed: rpc error: code = Unknown desc = failed to remove container "cdce17ef7168b13b58d9409524324d0067f46a554979cfea0db7f6a2fcc0627d": Error response from daem
on: unable to remove filesystem for cdce17ef7168b13b58d9409524324d0067f46a554979cfea0db7f6a2fcc0627d: CreateFile C:\ProgramData\docker\containers\cdce17ef7168b13b58d9409524324d0067f46a554979cfea0db7f6a2fcc0627d\cdce17ef7168b13b58d9409524324d0067f46a554979cfea0db7f6a2fcc0627d-json.log: Access is denied.
Additional context
Is there any update or fix for that problem like in Fluent-bit?
Describe the bug
I'm using Fluentd version 1.11.2 in EKS in Windows VMs. Sometimes pods get stuck in "terminating" status for hours and days, until I restart the VM. I checked the kubelet and the docker logs, and it seems like a log file which refuses to delete. Once I delete the Fluentd pod, the terminating pod succeed to delete. In my opinion, the root cause is the Fluentd which lock the file.
Here is a thread HERE which describes the same problem in Fluent-bit, and the last comment says it solved in the latest version.
To Reproduce
I didn't succeed to reproduce the bug, it just happen to me once in few days.
Expected behavior
I expect to have some fix in Fluentd like they did in Fluent-bit, which fixes the lock on files.
Your Environment
Your Configuration
Additional context
Is there any update or fix for that problem like in Fluent-bit?