Skip to content

sql: inverted index fixes#45564

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
jordanlewis:inverted-index-fixes
Mar 3, 2020
Merged

sql: inverted index fixes#45564
craig[bot] merged 3 commits intocockroachdb:masterfrom
jordanlewis:inverted-index-fixes

Conversation

@jordanlewis
Copy link
Copy Markdown
Member

@jordanlewis jordanlewis commented Mar 1, 2020

This is based on #45563, and split out from #45157.

Closes #45154.
Closes #32468.

Miscellaneous fixes for inverted indexes.

  1. Don't produce duplicate keys for inverted indexes.
  2. Permit rows to not emit any keys for a particular index, which is required to allow not having any inverted index entries for NULL (or empty container).
  3. Don't produce keys for NULL entries in inverted indexes

@jordanlewis jordanlewis requested a review from a team as a code owner March 1, 2020 22:12
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@jordanlewis jordanlewis force-pushed the inverted-index-fixes branch 3 times, most recently from 64bf331 to 35498f0 Compare March 2, 2020 00:59
query T kvtrace
INSERT INTO d VALUES(0, '{"a": "b"}')
----
CPut /Table/53/1/0/0 -> /TUPLE/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems strange -- where b in the value?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean the /TUPLE/ thing? I think we don't have something defined for arrays or json - it's like this with or without my patch.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't implement PrettyPrintValueEncoded for these types. I'll send a separate PR for that.

There's no reason to return duplicate keys when generating inverted
index keys for JSON. This can happen because JSON arrays can have
duplicate values. The KV system will eventually deduplicate these keys
too, but in a less efficient way.

Release note: None
This commit paves the way to having inverted indexes not return any
keys, for datums like NULL and empty arrays.

Release note: None
Previously, an inverted index on a JSON column would contain an index
entry for every row that had a SQL NULL value in the JSON column,
despite the the fact that the query planner does not use inverted
indexes to satisfy an IS NULL predicate. This commit changes the
behavior of inverted indexing to cease producing these index keys.

As a sidebar, Postgres also does not include NULL values in its inverted
indexes, nor does Elastic, so I don't think this behavior goes against
user expectation. In addition, the optimizer already did not use the
index for IS NULL predicates, so there will be no behavior changes for
users of inverted indexes.

This change has a side effect of orphaning any entries for NULL datums
that might already exist in an inverted index. These will not be cleaned
up, even if a particular row is updated to not contain NULL, until the
index is dropped or truncated. This problem is rare, and could only
conceivably be problematic for tables with many NULL values in a column
that's indexed with an inverted index, a very unusual case.

Release note: None
@jordanlewis jordanlewis force-pushed the inverted-index-fixes branch from 35498f0 to 7b09b62 Compare March 2, 2020 02:54
@jordanlewis jordanlewis changed the title Inverted index fixes sql: inverted index fixes Mar 2, 2020
@rohany
Copy link
Copy Markdown
Contributor

rohany commented Mar 2, 2020

LGTM

@jordanlewis
Copy link
Copy Markdown
Member Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 3, 2020

Build succeeded

@craig craig bot merged commit 06989f3 into cockroachdb:master Mar 3, 2020
@jordanlewis jordanlewis deleted the inverted-index-fixes branch March 4, 2020 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: extra inverted index keys generated for NULL IMPORT failing to create inverted indexes

3 participants