Use bufio.Reader for large lines processing#23
Conversation
|
Explain how it helps plz. Why it is better? |
|
When the log file line is longer than With So, this solution is better because it allows to process files with long (> 64KB) lines. |
|
Oh man... logs with over 64Kb lines long :/ What are you parsing? Could you make a bench on Scanner vs Reader? Just to be sure we have no valuable speed degradation on normal size logs. |
reader_test.go
Outdated
There was a problem hiding this comment.
Generate this string instead of pasting a lot ok kilobytes of text plz.
|
There's no speed difference between $ make bench
go test -bench .
..................................
34 total assertions
..............
48 total assertions
.........
57 total assertions
.....
62 total assertions
........................................
102 total assertions
PASS
BenchmarkParseSimpleLogRecord-4 200000 6291 ns/op
BenchmarkParseLogRecord-4 50000 30310 ns/op
BenchmarkScannerReader-4 2000000000 0.00 ns/op
BenchmarkReaderReader-4 2000000000 0.00 ns/op
ok github.com/pshevtsov/gonx 3.275s |
Yup. It's not the ordinary log file but some huge TSV dump with a few such long lines. |
|
Benchmark tests looks strange. First of all bench tests should measure one single operation (read line, in our case), but you read whole file. How mane lines in this file? And at the end, results are a little confusing... 0 ns/op? |
It reads this file. But all right, I'm going to redo the test cases to read just one line.
Yes, I also found it a bit weird. Probably that's because the whole file was loaded into memory. I'm going to check. |
|
Oh, I definetely need more sleep (or more ☕ ) I've just corrected the benchmarks: |
|
So we get 2.0 ms to read line with |
$ go test -bench Reader -benchmem -benchtime 1m
..................................
34 total assertions
..............
48 total assertions
.........
57 total assertions
.....
62 total assertions
........................................
102 total assertions
PASS
BenchmarkScannerReader-4 50000000 2296 ns/op 4096 B/op 1 allocs/op
BenchmarkReaderReaderAppend-4 50000000 2625 ns/op 4096 B/op 1 allocs/op
BenchmarkReaderReaderBuffer-4 30000000 2818 ns/op 4208 B/op 2 allocs/op
ok github.com/pshevtsov/gonx 392.334s |
|
Good job! Do you want to add something or we can merge it? |
|
Please hold on for some time. I'm working on some improvements — e.g. no need for a loop when reading short lines — so both techniques will work the same for the most common use cases (i.e. reading short lines). Also I'd like to test which technique ( I'll let you know soon. |
|
Hey, It turns out that for reading long lines using |
|
Hey @satyrius when are you going to merge this PR? I have a blocking issue because of inability to read long lines in the application I'm currently working on. Thanks! |
|
Well done! Going to master. |
Use bufio.Reader for large lines processing
This PR introduces using of
bufio.Reader(instead ofbufio.Scanner) for large lines processing.