Skip to content

Implement PostgreSQL-specific Query Digest Text Generator #5253

@rahim-kanji

Description

@rahim-kanji

Description

We currently generate query digests using a tokenizer originally designed for MySQL. While this works for basic cases, it fails to correctly normalize many PostgreSQL-specific syntax constructs. This issue tracks the implementation of a dedicated PostgreSQL tokenizer and query digest generator, fully isolated from MySQL behavior.

The goal is to ensure PostgreSQL queries are tokenized, normalized, and parameterized accurately according to PostgreSQL grammar, producing stable and correct query digests.

Motivation

PostgreSQL syntax differs significantly from MySQL, including but not limited to:

  • Type casts (::)
  • Dollar-quoted strings $$..$$
  • Array literals ARRAY[...]
  • Prefixed string and binary literals
  • Boolean literals
  • Nested block comments /* */
  • Identifier quoting rules

Using a MySQL tokenizer leads to incorrect digest grouping, false positives, and unstable query fingerprints for PostgreSQL workloads.

Scope / Requirements

Core Tokenizer Enhancements (PostgreSQL-specific)

  • Type Cast Handling

    • Support PostgreSQL type cast syntax (expression::type)
    • Works for literals, identifiers, and expressions
  • Array Literal Support

    • ARRAY[...] constructor
    • {...} array literal syntax
    • Multi-dimensional and nested arrays
  • Prefixed Literal Support

    • Escape string constants: E'...'
    • Unicode strings: U&'...' with optional UESCAPE
    • Hex strings: x'...'
    • Bit strings: b'...', B'...'
    • bytea hex format (\xDEADBEEF)
  • Boolean Normalization

    • Replace TRUE / FALSE (case-insensitive) with parameter placeholders
    • Preserve logical correctness in expressions and predicates
  • Dollar-Quoted Strings

    • Support $$...$$ and $tag$...$tag$
    • Normalize full literal content to a single placeholder
    • Correctly match opening and closing tags
  • Identifier Quoting

    • Treat double quotes ("identifier") as PostgreSQL identifiers
  • Comment Handling

    • Support PostgreSQL comment rules:
    • -- comments (no trailing space required)
    • /* ... */ block comments
    • Nested block comments

Testing & Validation

  • Add PostgreSQL-specific query digest test cases covering:
    • Type casts
    • Arrays
    • Dollar-quoted strings
    • Prefixed literals
    • Nested comments
    • Boolean normalization
  • Utility helpers to simplify PostgreSQL digest testing
  • Ensure consistent digests for semantically identical queries

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions