sql: move row allocation inside RowContainer by RaduBerinde · Pull Request #10534 · cockroachdb/cockroach

RaduBerinde · 2016-11-08T06:53:35Z

The majority of RowContainer.AddRow() callers make copies of the rows; most are
unoptimized (one allocation per row).

Changing RowContainer to manage row storage internally, instead of using the
slices passed by the caller. We allocate rows in chunks, which reduces the
allocations as well as the overhead of storing a slice for every row. The
callers are simplified.

This change also fixes a couple of bugs that were uncovered:

explainTraceNode exposed incorrect Columns();
insert code was appending to a row returned by a plan node via Values().

CC @nvanbenschoten for the first commit

This change is

nvb · 2016-11-08T07:54:41Z

First commit

Reviewed 2 of 3 files at r1.
Review status: 0 of 12 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.

pkg/sql/values.go, line 40 at r1 (raw file):

  // rowsPopped is used for heaps, it indicates the number of rows that were
  // "popped". These rows are still part of the underlying RowContainer.

For ease of future understanding, consider adding something about the popped rows being at the back of the RowContainer in the range [length-rowsPopped, length)

pkg/sql/values.go, line 269 at r1 (raw file):

}

// Pop implements the heap.Interface interface.

The awkwardness of this interface and its related functions always catches me off-guard

pkg/sql/values.go, line 289 at r1 (raw file):

}

// ResetLen resets the length to that of the underlying row container.

You might want to include the motivation for this. Something along the lines of: "this resets the effect that popping values had on the valuesNode length"

Comments from Reviewable

knz · 2016-11-08T09:05:37Z

Yeah this first commit is very good. Thanks for this :)

Reviewed 3 of 3 files at r1.
Review status: 0 of 12 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.

Comments from Reviewable

knz · 2016-11-08T09:23:47Z

And the 2nd one is very good too. I still see wisdom in splitting commits for different issues though, the overlap here is insufficient to justify grouping them.

Reviewed 12 of 12 files at r2.
Review status: all files reviewed at latest revision, 8 unresolved discussions, all commit checks successful.

pkg/sql/insert.go, line 296 at r2 (raw file):

  if len(rowVals) < len(n.insertCols) {
      // It's not cool to append to the slice returned by a node; make a copy.

Can you think of a test that would reveal this bug?
perhaps this too would be best in a separate commit.

pkg/sql/row_container.go, line 99 at r2 (raw file):

  if nCols != 0 {
      c.rowsPerChunk = (targetChunkSize + nCols - 1) / nCols
      preallocChunks := (rowCapacity + c.rowsPerChunk - 1) / c.rowsPerChunk

Move preallocChunks := ... allocChunks(preallocChunks) to a conditional if c.chunks == nil { ... } in AddRow.
This is because this allocation should really be accounted for to the monitor and the constructor is too early to do so.

pkg/sql/row_container.go, line 190 at r2 (raw file):

      return nil
  }
  // This is a potential hot path, use int32 for faster division.

This comment is redundant with the one in getChunkAndPos.

pkg/sql/trace.go, line 54 at r2 (raw file):

// Internally, the explainTraceNode also returns a timestamp column which is
// used during sorting.
var traceColumnsWithTS = append(traceColumns, ResultColumn{

This seems like an unrelated bug in the implementation of EXPLAIN(TRACE). I would recommend a separate commit for this.

pkg/sql/testdata/explain, line 69 at r2 (raw file):

Level  Type                  Description             Columns                                                                                                  Ordering
0      select                                        ("Cumulative Time", Duration, "Span Pos", Operation, Event, RowIdx, Key, Value, Disposition)             +9,+"Span Pos"
1      sort                  +Timestamp,+"Span Pos"  ("Cumulative Time", Duration, "Span Pos", Operation, Event, RowIdx, Key, Value, Disposition)             +9,+"Span Pos"

See my comment above - probably best in separate commit.

Comments from Reviewable

This commit replaces the PseudoPop functionality in RowContainer with a rowsPopped variable in the higher layer (valuesNode). This is cleaner and will allow some changes in RowContainer.

explainTraceNode was exposing incorrect Columns(): they did not include the timestamp column which is used during sorting. This resulted in a RowContainer that was configured with an incorrect number of columns.

RaduBerinde · 2016-11-08T13:45:00Z

TFTRs, updated!

Review status: 0 of 12 files reviewed at latest revision, 8 unresolved discussions, some commit checks pending.

pkg/sql/insert.go, line 296 at r2 (raw file):

Previously, knz (kena) wrote…

Can you think of a test that would reveal this bug?

perhaps this too would be best in a separate commit.

It doesn't really cause any issues right now - it was revealed while I was working on the change and `RowContainer.At()` wasn't limiting the capacity of the returned slice. Now it doesn't really break anything anymore but this is the right thing to do (and more efficient). I'll split into a different commit.

pkg/sql/row_container.go, line 99 at r2 (raw file):

Previously, knz (kena) wrote…

Move preallocChunks := ... allocChunks(preallocChunks) to a conditional if c.chunks == nil { ... } in AddRow.
This is because this allocation should really be accounted for to the monitor and the constructor is too early to do so.

But even if we are doing it in the first row, we are allocating more than that single row. Should I account for the chunk allocation separately from the per-row allocation? I'm asking because I'm not entirely sure what the intended model is (note that the old code it was ignoring the `rows` slice itself)

pkg/sql/row_container.go, line 190 at r2 (raw file):

Previously, knz (kena) wrote…

This comment is redundant with the one in getChunkAndPos.

Leftover, removed.

pkg/sql/trace.go, line 54 at r2 (raw file):

Previously, knz (kena) wrote…

This seems like an unrelated bug in the implementation of EXPLAIN(TRACE). I would recommend a separate commit for this.

I am splitting it off. It's related in that it causes a panic in the new implementation which actually checks that it gets rows of the correct length.

pkg/sql/values.go, line 40 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

For ease of future understanding, consider adding something about the popped rows being at the back of the RowContainer in the range [length-rowsPopped, length)

Done.

pkg/sql/values.go, line 289 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

You might want to include the motivation for this. Something along the lines of: "this resets the effect that popping values had on the valuesNode length"

Done.

Comments from Reviewable

knz · 2016-11-08T13:51:12Z

Reviewed 3 of 12 files at r3.
Review status: 0 of 12 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.

pkg/sql/row_container.go, line 99 at r2 (raw file):

Previously, RaduBerinde wrote…

But even if we are doing it in the first row, we are allocating more than that single row. Should I account for the chunk allocation separately from the per-row allocation? I'm asking because I'm not entirely sure what the intended model is (note that the old code it was ignoring the rows slice itself)

Yes it's better to add the chunk allocation as a whole. The model is that we should capture as much as possible, so that users don't get a OOM crash without expecting it. Meanwhile we don't want to over-shoot too much either so as to not restrict client connections excessively.

Comments from Reviewable

knz · 2016-11-08T13:53:42Z

Reviewed 9 of 12 files at r3, 1 of 1 files at r4, 2 of 2 files at r5, 9 of 9 files at r6.
Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks pending.

Comments from Reviewable

RaduBerinde · 2016-11-08T14:03:25Z

Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.

pkg/sql/row_container.go, line 99 at r2 (raw file):

Previously, knz (kena) wrote…

Yes it's better to add the chunk allocation as a whole. The model is that we should capture as much as possible, so that users don't get a OOM crash without expecting it. Meanwhile we don't want to over-shoot too much either so as to not restrict client connections excessively.

Got it. In fact preallocating 1 chunk here doesn't do much, AddRow would have done the same.. But when we are given a non-zero `rowCapacity` (which is basically asking us to prealloc a certain amount), shouldn't we do it and account for it here in the constructor?

Comments from Reviewable

knz · 2016-11-08T14:14:15Z

Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.

pkg/sql/row_container.go, line 99 at r2 (raw file):

Previously, RaduBerinde wrote…

Got it. In fact preallocating 1 chunk here doesn't do much, AddRow would have done the same.. But when we are given a non-zero rowCapacity (which is basically asking us to prealloc a certain amount), shouldn't we do it and account for it here in the constructor?

Yeah that was a shortcoming of the previous implementation. The initial/original code had the accounting in the constructor too, but then I bailed out when I saw the amount of additional error checking needed to ensure the container is closed in all cases. Then I decided that omitting to account for the pre-allocated row array was the better trade-off against the performance loss of re-allocating to grow the array when the size ought to have been known in advance. Perhaps you'll think better about this than I did.

Comments from Reviewable

RaduBerinde · 2016-11-08T14:18:34Z

Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.

pkg/sql/row_container.go, line 99 at r2 (raw file):

Previously, knz (kena) wrote…

Yeah that was a shortcoming of the previous implementation. The initial/original code had the accounting in the constructor too, but then I bailed out when I saw the amount of additional error checking needed to ensure the container is closed in all cases. Then I decided that omitting to account for the pre-allocated row array was the better trade-off against the performance loss of re-allocating to grow the array when the size ought to have been known in advance. Perhaps you'll think better about this than I did.

Ah, yes, I just realized the difficulty with error checking. Yes, seems like doing it on the first `AddRow` makes most sense.

Comments from Reviewable

The majority of RowContainer.AddRow() callers make copies of the rows; most are unoptimized (one allocation per row). Changing RowContainer to manage row storage internally, instead of using the slices passed by the caller. We allocate rows in chunks, which reduces the allocations as well as the overhead of storing a slice for every row. The callers are simplified.

RaduBerinde · 2016-11-08T14:49:55Z

Review status: 11 of 12 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.

pkg/sql/row_container.go, line 99 at r2 (raw file):

Previously, RaduBerinde wrote…

Ah, yes, I just realized the difficulty with error checking. Yes, seems like doing it on the first AddRow makes most sense.

Done, PTAL.

Comments from Reviewable

knz · 2016-11-08T15:04:24Z

Reviewed 1 of 1 files at r7.
Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.

pkg/sql/row_container.go, line 99 at r2 (raw file):

Previously, RaduBerinde wrote…

Done, PTAL.

Yep, looks good.

pkg/sql/row_container.go, line 185 at r7 (raw file):

      if len(c.chunks) > 0 {
          // Grow the number of chunks by a fraction.
          numChunks = 1 + len(c.chunks)/8

I think you can drop the conditional and replace this with numChunks := c.preallocChunks + len(c.chunks)/8.

Comments from Reviewable

RaduBerinde · 2016-11-08T15:13:54Z

Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.

pkg/sql/row_container.go, line 185 at r7 (raw file):

Previously, knz (kena) wrote…

I think you can drop the conditional and replace this with numChunks := c.preallocChunks + len(c.chunks)/8.

But if say we preallocate 1000 chunks and then we need more, I don't want to allocate 1125 more..

Comments from Reviewable

RaduBerinde · 2016-11-08T15:17:50Z

Verified there are no regressions in the basic benchmarks https://gist.github.com/RaduBerinde/099fe880b3c664d4b40bf789b44a7b9d

Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.

Comments from Reviewable

knz · 2016-11-08T15:19:32Z

Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.

pkg/sql/row_container.go, line 185 at r7 (raw file):

Previously, RaduBerinde wrote…

But if say we preallocate 1000 chunks and then we need more, I don't want to allocate 1125 more..

Woops yes you're right.

Comments from Reviewable

irfansharif · 2016-11-10T23:10:14Z

Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.

pkg/sql/row_container.go, line 215 at r7 (raw file):

  }
  chunk, pos := c.getChunkAndPos(i)
  return c.chunks[chunk][pos : pos+c.numCols : pos+c.numCols]

why return c.chunks[chunk][pos : pos+c.numCols : pos+c.numCols] and not return c.chunks[chunk][pos : pos+c.numCols]?

Comments from Reviewable

RaduBerinde · 2016-11-10T23:16:30Z

pkg/sql/row_container.go, line 215 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

why return c.chunks[chunk][pos : pos+c.numCols : pos+c.numCols] and not return c.chunks[chunk][pos : pos+c.numCols]?

Because if someone tries to append to a row returned by `At()`, they will overwrite other rows. That's not really something that one is supposed to do with these rows, but there was such code that I fixed as part of this PR.

Comments from Reviewable

RaduBerinde assigned knz and nvb Nov 8, 2016

RaduBerinde added 3 commits November 8, 2016 08:25

sql: clean up PseudoPop in RowContainer

5f6e679

This commit replaces the PseudoPop functionality in RowContainer with a rowsPopped variable in the higher layer (valuesNode). This is cleaner and will allow some changes in RowContainer.

sql: don't append to slice we don't own in insert

4048d32

sql: fix explainTraceNode Columns

465b7cd

explainTraceNode was exposing incorrect Columns(): they did not include the timestamp column which is used during sorting. This resulted in a RowContainer that was configured with an incorrect number of columns.

RaduBerinde force-pushed the rowcontainer-addrow branch from f4b3882 to 4d4cbb3 Compare November 8, 2016 13:39

RaduBerinde force-pushed the rowcontainer-addrow branch from 4d4cbb3 to 52b02c8 Compare November 8, 2016 14:44

RaduBerinde merged commit 2bde1cc into cockroachdb:master Nov 8, 2016

RaduBerinde deleted the rowcontainer-addrow branch November 8, 2016 16:03

irfansharif mentioned this pull request Nov 10, 2016

sql: add hash join algorithm for USING/NATURAL predicates #10618

Merged

irfansharif mentioned this pull request Nov 10, 2016

sql: added PopFirst functionality to RowContainer #10623

Merged

Conversation

RaduBerinde commented Nov 8, 2016 • edited by rjnn Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nvb commented Nov 8, 2016

Uh oh!

knz commented Nov 8, 2016

Uh oh!

knz commented Nov 8, 2016

Uh oh!

RaduBerinde commented Nov 8, 2016

Uh oh!

knz commented Nov 8, 2016

Uh oh!

knz commented Nov 8, 2016

Uh oh!

RaduBerinde commented Nov 8, 2016

Uh oh!

knz commented Nov 8, 2016

Uh oh!

RaduBerinde commented Nov 8, 2016

Uh oh!

RaduBerinde commented Nov 8, 2016

Uh oh!

knz commented Nov 8, 2016

Uh oh!

RaduBerinde commented Nov 8, 2016

Uh oh!

RaduBerinde commented Nov 8, 2016

Uh oh!

knz commented Nov 8, 2016

Uh oh!

irfansharif commented Nov 10, 2016

Uh oh!

RaduBerinde commented Nov 10, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

RaduBerinde commented Nov 8, 2016 •

edited by rjnn

Loading