Skip to content

sql: forward index histogram on an trigram inverted index column causes internal errors #84529

@jordanlewis

Description

@jordanlewis

When the optimizer sees a histogram on a normal column of type text (now both forward indexable and inverted indexable with gin_trgm_ops) while trying to build spans for an inverted index, an error is returned due to the type mismatch of the bytes (inverted) histogram it expects and the text (forward) histogram it finds.

This is basically a manifestation of #50655, which covers the fact that histograms don't have a "type" attribute that could help distinguish inverted from forward sketches/histograms, or distinguish different types of inverted histograms from each other.

Here is a simple repro:

demo@127.0.0.1:26257/defaultdb> create table a (a) as select 'hi';
CREATE TABLE AS

demo@127.0.0.1:26257/defaultdb> analyze a; 
ANALYZE

demo@127.0.0.1:26257/defaultdb> create inverted index on a(a gin_trgm_ops) ;
CREATE INDEX

demo@127.0.0.1:26257/defaultdb> explain select * from a where a like '%ffooo%';
ERROR: unsupported comparison: string to bytes
SQLSTATE: 42804

If analyze is re-run, all is well again. I think this is because the most recent histogram for the column becomes the inverted column. This raises the question of whether an ordinary forward scan will find the wrong histogram. I wasn't able to easily discover any internal errors in this state, but I wouldn't be surprised if things were going wrong in more subtle ways.

demo@127.0.0.1:26257/defaultdb> analyze a;
ANALYZE


Time: 18ms total (execution 18ms / network 0ms)

demo@127.0.0.1:26257/defaultdb> explain select * from a where a like '%ffooo%';
                                            info
---------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • filter
  │ estimated row count: 0
  │ filter: a LIKE '%ffooo%'
  │
  └── • index join
      │ estimated row count: 0
      │ table: a@a_pkey
      │
      └── • inverted filter
          │ inverted column: a_inverted_key
          │ num spans: 3
          │
          └── • scan
                estimated row count: 0 (<0.01% of the table; stats collected 2 seconds ago)
                table: a@a_a_idx
                spans: 3 spans
(19 rows)


Time: 1ms total (execution 1ms / network 0ms)

Jira issue: CRDB-17708

Metadata

Metadata

Assignees

Labels

C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-sql-queriesSQL Queries Team

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions