-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Description
We currently generate query digests using a tokenizer originally designed for MySQL. While this works for basic cases, it fails to correctly normalize many PostgreSQL-specific syntax constructs. This issue tracks the implementation of a dedicated PostgreSQL tokenizer and query digest generator, fully isolated from MySQL behavior.
The goal is to ensure PostgreSQL queries are tokenized, normalized, and parameterized accurately according to PostgreSQL grammar, producing stable and correct query digests.
Motivation
PostgreSQL syntax differs significantly from MySQL, including but not limited to:
- Type casts (
::) - Dollar-quoted strings
$$..$$ - Array literals
ARRAY[...] - Prefixed string and binary literals
- Boolean literals
- Nested block comments
/* */ - Identifier quoting rules
Using a MySQL tokenizer leads to incorrect digest grouping, false positives, and unstable query fingerprints for PostgreSQL workloads.
Scope / Requirements
Core Tokenizer Enhancements (PostgreSQL-specific)
-
Type Cast Handling
- Support PostgreSQL type cast syntax (expression::type)
- Works for literals, identifiers, and expressions
-
Array Literal Support
ARRAY[...]constructor{...}array literal syntax- Multi-dimensional and nested arrays
-
Prefixed Literal Support
- Escape string constants:
E'...' - Unicode strings:
U&'...'with optionalUESCAPE - Hex strings:
x'...' - Bit strings:
b'...',B'...' - bytea hex format (
\xDEADBEEF)
- Escape string constants:
-
Boolean Normalization
- Replace
TRUE / FALSE(case-insensitive) with parameter placeholders - Preserve logical correctness in expressions and predicates
- Replace
-
Dollar-Quoted Strings
- Support
$$...$$and$tag$...$tag$ - Normalize full literal content to a single placeholder
- Correctly match opening and closing tags
- Support
-
Identifier Quoting
- Treat double quotes (
"identifier") as PostgreSQL identifiers
- Treat double quotes (
-
Comment Handling
- Support PostgreSQL comment rules:
--comments (no trailing space required)/* ... */block comments- Nested block comments
Testing & Validation
- Add PostgreSQL-specific query digest test cases covering:
- Type casts
- Arrays
- Dollar-quoted strings
- Prefixed literals
- Nested comments
- Boolean normalization
- Utility helpers to simplify PostgreSQL digest testing
- Ensure consistent digests for semantically identical queries