Use bufio.Reader for large lines processing by przmv · Pull Request #23 · satyrius/gonx

przmv · 2015-11-15T14:24:32Z

This PR introduces using of bufio.Reader (instead of bufio.Scanner) for large lines processing.

satyrius · 2015-11-15T16:11:33Z

Explain how it helps plz. Why it is better?

przmv · 2015-11-15T16:39:13Z

When the log file line is longer than bufio.Scanner buffer (MaxScanTokenSize = 64 * 1024) it returns ErrTooLong. That means, that you can't process lines longer than 64KB with bufio.Scanner.

With bufio.Reader you can check if the buffer contains the whole line, if not you can't do one more buffered read.

So, this solution is better because it allows to process files with long (> 64KB) lines.

satyrius · 2015-11-15T17:34:20Z

Oh man... logs with over 64Kb lines long :/ What are you parsing?

Could you make a bench on Scanner vs Reader? Just to be sure we have no valuable speed degradation on normal size logs.

satyrius · 2015-11-15T17:35:28Z

reader_test.go

Generate this string instead of pasting a lot ok kilobytes of text plz.

przmv · 2015-11-15T19:51:11Z

There's no speed difference between Scanner and Reader:

$ make bench 
go test -bench .
..................................
34 total assertions

..............
48 total assertions

.........
57 total assertions

.....
62 total assertions

........................................
102 total assertions

PASS
BenchmarkParseSimpleLogRecord-4   200000          6291 ns/op
BenchmarkParseLogRecord-4          50000         30310 ns/op
BenchmarkScannerReader-4        2000000000           0.00 ns/op
BenchmarkReaderReader-4         2000000000           0.00 ns/op
ok      github.com/pshevtsov/gonx   3.275s

przmv · 2015-11-15T19:53:57Z

Oh man... logs with over 64Kb lines long :/ What are you parsing?

Yup. It's not the ordinary log file but some huge TSV dump with a few such long lines.

satyrius · 2015-11-16T10:35:18Z

Benchmark tests looks strange. First of all bench tests should measure one single operation (read line, in our case), but you read whole file. How mane lines in this file? And at the end, results are a little confusing... 0 ns/op?

przmv · 2015-11-16T17:59:05Z

How many lines in this file?

It reads this file. But all right, I'm going to redo the test cases to read just one line.

And at the end, results are a little confusing... 0 ns/op?

Yes, I also found it a bit weird. Probably that's because the whole file was loaded into memory. I'm going to check.

przmv · 2015-11-16T19:19:03Z

Oh, I definetely need more sleep (or more ☕ )

I've just corrected the benchmarks:

BenchmarkScannerReader-4          500000          2040 ns/op
BenchmarkReaderReader-4           500000          2206 ns/op

satyrius · 2015-11-17T10:34:48Z

So we get 2.0 ms to read line with Scanner and 2.2 to read with Reader. 10% slower on reading.

przmv · 2015-11-18T00:41:05Z

$ go test -bench Reader -benchmem -benchtime 1m
..................................
34 total assertions

..............
48 total assertions

.........
57 total assertions

.....
62 total assertions

........................................
102 total assertions

PASS
BenchmarkScannerReader-4        50000000          2296 ns/op        4096 B/op          1 allocs/op
BenchmarkReaderReaderAppend-4   50000000          2625 ns/op        4096 B/op          1 allocs/op
BenchmarkReaderReaderBuffer-4   30000000          2818 ns/op        4208 B/op          2 allocs/op
ok      github.com/pshevtsov/gonx   392.334s

satyrius · 2015-11-18T18:57:03Z

Good job! Do you want to add something or we can merge it?

przmv · 2015-11-18T19:37:09Z

Please hold on for some time. I'm working on some improvements — e.g. no need for a loop when reading short lines — so both techniques will work the same for the most common use cases (i.e. reading short lines). Also I'd like to test which technique (append or bytes.Buffer) will work better for long lines.

I'll let you know soon.
Cheers

przmv · 2015-11-20T00:13:16Z

Hey,

It turns out that for reading long lines using bytes.Buffer is better:

BenchmarkScannerReader-4              500000          2679 ns/op        4305 B/op          5 allocs/op
BenchmarkReaderReaderAppend-4         500000          2650 ns/op        4305 B/op          5 allocs/op
BenchmarkReaderReaderBuffer-4         500000          2640 ns/op        4305 B/op          5 allocs/op
BenchmarkLongReaderReaderAppend-4        500       2588516 ns/op     5666680 B/op         29 allocs/op
BenchmarkLongReaderReaderBuffer-4       1000       2276376 ns/op     4076730 B/op         20 allocs/op

przmv · 2015-11-21T00:35:07Z

Hey @satyrius when are you going to merge this PR? I have a blocking issue because of inability to read long lines in the application I'm currently working on. Thanks!

satyrius · 2015-11-23T10:47:05Z

Well done! Going to master.

Use bufio.Reader for large lines processing

Petr Shevtsov added 2 commits November 15, 2015 17:13

Use bufio.Reader for large lines processing

770ecf2

Adding test

8010895

satyrius reviewed Nov 15, 2015
View reviewed changes

reader_test.go Outdated

Copy link
Copy Markdown

Owner

satyrius Nov 15, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generate this string instead of pasting a lot ok kilobytes of text plz.

Petr Shevtsov added 2 commits November 15, 2015 22:27

Generate long string

12e631e

Initial commit

ad80209

Correct tests

16c0ba2

Use strings.Reader to prevent possible IO delays

4db104d

Adding more benchmarks

e9b0707

Benchmark reading long lines

a807e26

Early return reading short lines. Using bytes.Buffer reading long ones.

1e39ff3

satyrius added a commit that referenced this pull request Nov 23, 2015

Merge pull request #23 from pshevtsov/long_token

7a047c4

Use bufio.Reader for large lines processing

satyrius merged commit 7a047c4 into satyrius:master Nov 23, 2015

przmv deleted the long_token branch December 1, 2015 16:56

Conversation

przmv commented Nov 15, 2015

Uh oh!

satyrius commented Nov 15, 2015

Uh oh!

przmv commented Nov 15, 2015

Uh oh!

satyrius commented Nov 15, 2015

Uh oh!

satyrius Nov 15, 2015

Choose a reason for hiding this comment

Uh oh!

przmv commented Nov 15, 2015

Uh oh!

przmv commented Nov 15, 2015

Uh oh!

satyrius commented Nov 16, 2015

Uh oh!

przmv commented Nov 16, 2015

Uh oh!

przmv commented Nov 16, 2015

Uh oh!

satyrius commented Nov 17, 2015

Uh oh!

przmv commented Nov 18, 2015

Uh oh!

satyrius commented Nov 18, 2015

Uh oh!

przmv commented Nov 18, 2015

Uh oh!

przmv commented Nov 20, 2015

Uh oh!

przmv commented Nov 21, 2015

Uh oh!

satyrius commented Nov 23, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants