-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Description
Relevant telegraf.conf:
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
hostname = ""
omit_hostname = false
[[outputs.influxdb]]
urls = ["http://192.168.50.10:8086"]
database = "telegraf"
[[inputs.logparser]]
files = ["C:/Program Files/Apache/logs/All-AccessLog.*"]
from_beginning = false
[inputs.logparser.grok]
patterns = ["%{COMBINED_LOG_FORMAT} %{RESPONSE_TIME}"]
measurement = "apache_access_log"System info:
- OS: Windows 10 (version 1903)
- Telegraf: telegraf-1.12.3_windows_amd64
Steps to reproduce:
- Start a process to repeatedly request pages from the Apache server to ensure log lines keep being written to the access logs (I used
watch -n 10 wget -q -r -P wget-out https://<url>) - Start telegraf:
telegraf.exe --console --config <path to above config>
- In influx CLI, select the
telegrafdatabase and keep executing and watching the results of this query:
select * from apache_access_log order by time desc limit 10;
Expected behavior:
New data points should keep appearing in apache_access_log, shortly after each logline appears in the Apache log.
Actual behavior:
Observe that immediately after starting, fresh data will appear in apache_access_log, but that will stop after a shorter or longer period of time (but usually within a few minutes max), even though new log lines are appearing in the Apache log files.
Additional info:
I think I've reproduced the cause of this issue using the following simple Go program that uses the same code for tailing the log files as telegraf does:
package main
import (
"log"
"os"
"github.com/influxdata/tail"
)
func main() {
t, err := tail.TailFile("c:\\Program Files\\Apache\\logs\\All-AccessLog.2019-10-17", tail.Config{
Follow: true,
ReOpen: true,
Location: &tail.SeekInfo{Offset: -200, Whence: os.SEEK_END}, // start 200 bytes from the end
Logger: log.New(os.Stdout, "[TailLog] ", log.LstdFlags),
Poll: true,
})
if err != nil {
log.Fatal(err)
}
for line := range t.Lines {
log.Println(line.Text)
}
}After some experimenting (printf-debugging, really) in the influxdata/tail code, I noticed that the tail would sooner or later die, as a result of a "non-EOF error". The output of the above program was:
...
[lines from the file that is being tailed]
...
[TailLog] 2019/10/17 16:01:26 non-EOF error: %!(EXTRA *os.PathError=read c:\Program Files\Apache\logs\All-AccessLog.2019-10-17: The process cannot access the file because another process has locked a portion of the file.)
This would happen at this place in the tail.go file in the tailFileSync func:
...
line, err := tail.readLine()
// Process `line` even if err is EOF.
if err == nil {
...
} else if err == io.EOF {
...
} else {
tail.Logger.Printf("non-EOF error: ", err) // Line added by me
// non-EOF error
tail.Killf("Error reading %s: %s", tail.Filename, err)
return
}So it appears that the fact that Apache is writing additional lines to the file at the same moment telegraf logparser wants to read new lines is causing this issue.
(Initially I wanted to report this at https://github.com/influxdata/tail but that repository doesn't accept issues.)