Skip to content

distsqlrun: make joinreader use limited batches#38837

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
jordanlewis:joinreader-limit
Jul 16, 2019
Merged

distsqlrun: make joinreader use limited batches#38837
craig[bot] merged 2 commits intocockroachdb:masterfrom
jordanlewis:joinreader-limit

Conversation

@jordanlewis
Copy link
Copy Markdown
Member

Closes #35950.

Previously, lookup join put no limit on the number of returned rows from
each of its lookups. This could be hugely problematic. Imagine a
circumstance in which a lookup join was planned on a pair of tables that
had 1 billion match rows on the right side for every row on the left
side: in this case, the lookup join would ask the KV backend to return a
single giant buffer with all 1 billion rows.

Also, joinReader's tracing span collection was slightly broken - fix that while we're at it, to make sure we can test this change properly.

Release note (bug fix): prevent OOM conditions during lookup joins
between tables with a very large n:1 relationship.

@jordanlewis jordanlewis requested review from a team and solongordon July 11, 2019 23:50
@jordanlewis jordanlewis requested a review from a team as a code owner July 11, 2019 23:50
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 2 of 2 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)


pkg/sql/opt/exec/execbuilder/testdata/lookup_join, line 787 at r2 (raw file):

SET tracing = on; SELECT COUNT(*) FROM (SELECT * FROM b NATURAL INNER LOOKUP JOIN a); SET tracing = off

let $lookupTableID

Huh, this is cool! I don't think I've seen using variables in logic tests before.

@RaduBerinde
Copy link
Copy Markdown
Member


pkg/sql/opt/exec/execbuilder/testdata/lookup_join, line 787 at r2 (raw file):

Previously, yuzefovich wrote…

Huh, this is cool! I don't think I've seen using variables in logic tests before.

Unfortunately the test below still only passes when this is 63 (because of the output)..

Copy link
Copy Markdown
Contributor

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Glad it was this straightforward!

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)


pkg/sql/opt/exec/execbuilder/testdata/lookup_join, line 785 at r3 (raw file):


statement ok
SET tracing = on; SELECT count(*) FROM (SELECT * FROM b NATURAL INNER LOOKUP JOIN a); SET tracing = off

I'd say might as well verify the count(*) result while you're at it.

Previously, lookup join put no limit on the number of returned rows from
each of its lookups. This could be hugely problematic. Imagine a
circumstance in which a lookup join was planned on a pair of tables that
had 1 billion match rows on the right side for every row on the left
side: in this case, the lookup join would ask the KV backend to return a
single giant buffer with all 1 billion rows.

Release note (bug fix): prevent OOM conditions during lookup joins
between tables with a very large n:1 relationship.
Copy link
Copy Markdown
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @solongordon and @yuzefovich)


pkg/sql/opt/exec/execbuilder/testdata/lookup_join, line 787 at r2 (raw file):

Previously, RaduBerinde wrote…

Unfortunately the test below still only passes when this is 63 (because of the output)..

Yeah :( But at least it'll rewrite correctly with -rewrite without having to edit the queries.


pkg/sql/opt/exec/execbuilder/testdata/lookup_join, line 785 at r3 (raw file):

Previously, solongordon (Solon) wrote…

I'd say might as well verify the count(*) result while you're at it.

Done.

@jordanlewis
Copy link
Copy Markdown
Member Author

bors r+

craig bot pushed a commit that referenced this pull request Jul 16, 2019
38837: distsqlrun: make joinreader use limited batches r=jordanlewis a=jordanlewis

Closes #35950.

Previously, lookup join put no limit on the number of returned rows from
each of its lookups. This could be hugely problematic. Imagine a
circumstance in which a lookup join was planned on a pair of tables that
had 1 billion match rows on the right side for every row on the left
side: in this case, the lookup join would ask the KV backend to return a
single giant buffer with all 1 billion rows.

Also, joinReader's tracing span collection was slightly broken - fix that while we're at it, to make sure we can test this change properly.

Release note (bug fix): prevent OOM conditions during lookup joins
between tables with a very large n:1 relationship.

38887: exec: check for unsupported by ArrowBatchConverter type r=yuzefovich a=yuzefovich

Previously, we would create an arrow batch converter without
paying attention to whether we supported the types which could
lead to a panic during actual conversion. Now we do this check
upfront so that we can fall back to DistSQL.

Release note: None

Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 16, 2019

Build succeeded

@craig craig bot merged commit 5368cf0 into cockroachdb:master Jul 16, 2019
@jordanlewis jordanlewis deleted the joinreader-limit branch July 17, 2019 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: lookup joins should use batch limit

5 participants