bench: fix colserde benchmarks by cucaroach · Pull Request #71892 · cockroachdb/cockroach

cucaroach · 2021-10-22T21:16:21Z

And optimization to aid the Go GC broke benchmarks of Serialize
and ArrowToBench, fix them by restoring the nil'd out arrays using a
copy.

Fixes: #71886

Release note: None

cockroach-teamcity · 2021-10-22T21:16:29Z

This change is

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @yuzefovich)

pkg/col/colserde/arrowbatchconverter.go, line 326 at r1 (raw file):

		// as quickly as possible as we copy each (or simply reference each) by
		// coldata.Vecs below.
		if clearBuffer {

Now we have an additional conditional on the production path that wasn't there before. Likely it's a drop in the ocean, but I still think it's better to adjust the benchmark to perform the shallow copy rather than slow down, even a little, the production path. We'll need to backport the fix to 21.1 and 21.2 anyway, so if we make the benchmarks a bit slower on both branches, it's ok .

cucaroach

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/col/colserde/arrowbatchconverter.go, line 326 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Now we have an additional conditional on the production path that wasn't there before. Likely it's a drop in the ocean, but I still think it's better to adjust the benchmark to perform the shallow copy rather than slow down, even a little, the production path. We'll need to backport the fix to 21.1 and 21.2 anyway, so if we make the benchmarks a bit slower on both branches, it's ok .

Why do we need to backport the fix the 21.1? Things work fine in that branch. I'm not in love with this, how about we take the data[i] = nil out completely and hoist it up to the callsite and clear the whole slice there? I doubt clearing as we go matters and doing it once should be more efficient and be cleaner. There's very few prod callers of these functions so it shouldn't be too bad. Another alternative would be to introduce a new API that does the nil'ing out so like this change but w/o the parameterization and per-loop branch.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @yuzefovich)

pkg/col/colserde/arrowbatchconverter.go, line 326 at r1 (raw file):
I'm surprised that things work on 21.1 because the commit in question was backported there (#67609) - maybe the SHA you used for 21.1 was before that?

how about we take the data[i] = nil out completely and hoist it up to the callsite and clear the whole slice there? There's very few prod callers of these functions so it shouldn't be too bad.

I think there are two production callers that care about nilling things out, so we'd have to duplicate that clearing logic in two places (or use a helper function call).

I agree that clearing things as we go might not be the cleanest approach, but we definitely want to strive for removal of references as fast as possible, so I'm still not sold on the idea of modifying the production code to make the benchmarks happy. Is it a single benchmark that crashes or several?

cucaroach

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/col/colserde/arrowbatchconverter.go, line 326 at r1 (raw file):

we definitely want to strive for removal of references as fast as possible

I bet dollars to donuts this doesn't matter as much as you think it does. Incremental GC's have write barriers so even if the marker sees the data in GC cycle N and then we nil it out in the next moment the slice will probably be marked to be re-visited so it will most likely catch the nil'ing in the same cycle. Furthermore the garbage is lazily swept so even if the GC free's this memory its a coin toss as to whether it will be available for immediate re-use (although mutator work usually drives the sweeper so heap hungry apps will quickly drive the sweeper to completion).

cucaroach

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/col/colserde/arrowbatchconverter.go, line 326 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

we definitely want to strive for removal of references as fast as possible

I bet dollars to donuts this doesn't matter as much as you think it does. Incremental GC's have write barriers so even if the marker sees the data in GC cycle N and then we nil it out in the next moment the slice will probably be marked to be re-visited so it will most likely catch the nil'ing in the same cycle. Furthermore the garbage is lazily swept so even if the GC free's this memory its a coin toss as to whether it will be available for immediate re-use (although mutator work usually drives the sweeper so heap hungry apps will quickly drive the sweeper to completion).

Actually ignore that, that makes no sense. But still I don't think nil'ing out once per call and once per loop will make any difference unless the data slice can be huge. How big are these things typically?

cucaroach

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/col/colserde/arrowbatchconverter.go, line 326 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

Actually ignore that, that makes no sense. But still I don't think nil'ing out once per call and once per loop will make any difference unless the data slice can be huge. How big are these things typically?

Another pertinent question is what does it make the most sense to be benchmarking, does the benchmark version with the data copying more accurately reflect what goes on in prod?

yuzefovich

Reviewed 1 of 3 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @yuzefovich)

pkg/col/colserde/arrowbatchconverter.go, line 326 at r1 (raw file):

How big are these things typically?

We have two production use cases - sending the intermediate data between nodes (outbox-inbox path) or spilling the intermediate data to disk. In both cases we try to limit the size of coldata.Batch by distsql_workmem which is 64MiB by default, so I'm guessing that these data[i] combined could be up to 64MiB limit too. Note that we do have compression in place, but it is done on top of arrow format, so I don't think it saves us here (i.e. data[i] is already the de-compressed stuff).

what does it make the most sense to be benchmarking, does the benchmark version with the data copying more accurately reflect what goes on in prod?

What data copying are you referring to?

I think that we mostly use micro benchmarks to find significant regressions between releases, and we typically don't care that much about the absolute numbers.

cucaroach

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/col/colserde/arrowbatchconverter.go, line 326 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

How big are these things typically?

We have two production use cases - sending the intermediate data between nodes (outbox-inbox path) or spilling the intermediate data to disk. In both cases we try to limit the size of coldata.Batch by distsql_workmem which is 64MiB by default, so I'm guessing that these data[i] combined could be up to 64MiB limit too. Note that we do have compression in place, but it is done on top of arrow format, so I don't think it saves us here (i.e. data[i] is already the de-compressed stuff).

what does it make the most sense to be benchmarking, does the benchmark version with the data copying more accurately reflect what goes on in prod?

What data copying are you referring to?

I think that we mostly use micro benchmarks to find significant regressions between releases, and we typically don't care that much about the absolute numbers.

I was referring to your version of the benchmark which copied the data, maybe that's more representative of the real workload? I'm curious how big the slice can get, like each element of the slice is an array, I'm guessing each array is pretty big and the slice itself is small?

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @yuzefovich)

pkg/col/colserde/arrowbatchconverter.go, line 326 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

I was referring to your version of the benchmark which copied the data, maybe that's more representative of the real workload? I'm curious how big the slice can get, like each element of the slice is an array, I'm guessing each array is pretty big and the slice itself is small?

len(data) is equal to the width of the batch which is equal to the number of columns touched by the query, so it's a relatively small number. However, each data[i] could point to a memory region that is MBs in size. What I was suggesting was to do a shallow copy - this would only involve copying len(data) pointers - it would not touch the large memory regions at all.

In terms of what would be more realistic - I think that shallow copy wouldn't matter in the grand scheme of things since it should be very fast.

And optimization to aid the Go GC broke benchmarks of Serialize and ArrowToBench, fix them by restoring the nil'd out arrays using a copy. Fixes: cockroachdb#71886 Release note: None

cucaroach

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/col/colserde/arrowbatchconverter.go, line 326 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

len(data) is equal to the width of the batch which is equal to the number of columns touched by the query, so it's a relatively small number. However, each data[i] could point to a memory region that is MBs in size. What I was suggesting was to do a shallow copy - this would only involve copying len(data) pointers - it would not touch the large memory regions at all.

In terms of what would be more realistic - I think that shallow copy wouldn't matter in the grand scheme of things since it should be very fast.

+1

yuzefovich

Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach)

cucaroach · 2021-10-25T21:52:41Z

bors r+

craig · 2021-10-25T23:12:15Z

Build succeeded:

GitHub CI (Cockroach)

yuzefovich · 2022-03-29T03:05:15Z

blathers backport 21.2

cucaroach requested a review from a team as a code owner October 22, 2021 21:16

cucaroach requested a review from yuzefovich October 22, 2021 21:16

yuzefovich reviewed Oct 22, 2021

View reviewed changes

cucaroach commented Oct 22, 2021

View reviewed changes

cucaroach force-pushed the arrow-noclear branch from 05e0d64 to 1110c35 Compare October 22, 2021 21:45

yuzefovich reviewed Oct 22, 2021

View reviewed changes

cucaroach commented Oct 22, 2021

View reviewed changes

yuzefovich reviewed Oct 22, 2021

View reviewed changes

cucaroach commented Oct 23, 2021

View reviewed changes

yuzefovich reviewed Oct 23, 2021

View reviewed changes

cucaroach closed this Oct 25, 2021

cucaroach force-pushed the arrow-noclear branch from 1110c35 to c1eefa0 Compare October 25, 2021 13:27

bench: fix colserde benchmarks

757e898

And optimization to aid the Go GC broke benchmarks of Serialize and ArrowToBench, fix them by restoring the nil'd out arrays using a copy. Fixes: cockroachdb#71886 Release note: None

cucaroach changed the title ~~bench: fix arrowbatch benchmarks~~ bench: fix colserde benchmarks Oct 25, 2021

cucaroach reopened this Oct 25, 2021

cucaroach commented Oct 25, 2021

View reviewed changes

yuzefovich approved these changes Oct 25, 2021

View reviewed changes

craig bot merged commit dbf5853 into cockroachdb:master Oct 25, 2021

blathers-crl bot mentioned this pull request Mar 29, 2022

release-21.2: bench: fix colserde benchmarks #78697

Merged

Conversation

cucaroach commented Oct 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Oct 22, 2021

Uh oh!

yuzefovich left a comment

Choose a reason for hiding this comment

Uh oh!

cucaroach left a comment

Choose a reason for hiding this comment

Uh oh!

yuzefovich left a comment

Choose a reason for hiding this comment

Uh oh!

cucaroach left a comment

Choose a reason for hiding this comment

Uh oh!

cucaroach left a comment

Choose a reason for hiding this comment

Uh oh!

cucaroach left a comment

Choose a reason for hiding this comment

Uh oh!

yuzefovich left a comment

Choose a reason for hiding this comment

Uh oh!

cucaroach left a comment

Choose a reason for hiding this comment

Uh oh!

yuzefovich left a comment

Choose a reason for hiding this comment

Uh oh!

cucaroach left a comment

Choose a reason for hiding this comment

Uh oh!

yuzefovich left a comment

Choose a reason for hiding this comment

Uh oh!

cucaroach commented Oct 25, 2021

Uh oh!

craig bot commented Oct 25, 2021

Uh oh!

yuzefovich commented Mar 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cucaroach commented Oct 22, 2021 •

edited

Loading