Skip to content

feat(webui): Set up ANTLR tooling for future client-side SQL parsing; Move clp-s parser generation to codegen.yaml; Remove unused client package-lock.json.#1293

Merged
hoophalab merged 24 commits into
y-scope:mainfrom
hoophalab:sqlparser
Sep 15, 2025

Conversation

@hoophalab

@hoophalab hoophalab commented Sep 9, 2025

Copy link
Copy Markdown
Contributor

Description

  1. Add the ANTLR dependency in the webui client.
  2. Use the grammar file from the 0.293 release of the Presto repository
    https://github.com/prestodb/presto/blob/release-0.293/presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
    permalink: https://github.com/prestodb/presto/blob/2f48ff63a3d6d249b4062546003126ddbbb989ee/presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
  3. Include the generated parser files directly in the codebase similar to refactor(clp-s): Add ANTLR-generated parsers to codebase to eliminate need for code dependencies at build time. #925.
  4. Add an example validate function for SQL input.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

validate detects incorrect sql queries.
clp-s-generate-parsers works the same as before.
webui-generate-parser generates the parser correctly.

Summary by CodeRabbit

  • New Features

    • Introduced SQL syntax validation in the web UI with case-insensitive parsing and clear line/column error messages.
  • Chores

    • Added a new runtime dependency to support the parser.
    • Centralized and automated parser code generation via shared tasks.
    • Updated CI workflow to use the new codegen task.
    • Excluded generated files from linting to reduce noise.
    • Updated notices to include third‑party attribution.

@hoophalab hoophalab requested a review from a team as a code owner September 9, 2025 03:50
@coderabbitai

coderabbitai Bot commented Sep 9, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Adds a SQL parser to the web UI using ANTLR4, including a new grammar, TypeScript validation module, and build tasks for code generation. Updates ESLint to ignore generated code, adds the antlr4 dependency, adjusts Taskfile includes and CI workflow to use centralized codegen tasks, and updates NOTICE.

Changes

Cohort / File(s) Summary of changes
Web UI SQL parser (grammar, validator, deps)
components/webui/client/src/sql-parser/SqlBase.g4, components/webui/client/src/sql-parser/index.ts, components/webui/client/package.json, components/webui/client/eslint.config.mjs, NOTICE
Adds ANTLR4 SQL grammar and a TypeScript validator using generated lexer/parser; adds antlr4 dependency; ESLint ignores generated dir; NOTICE credits grammar authors.
Centralized code generation tasks
taskfile.yaml, taskfiles/codegen.yaml
Moves parser generation to included codegen Taskfile; adds clp-s-generate-parsers (C++ target) and webui-generate-parser (TypeScript target) tasks with checksum guards; updates core task deps.
CI workflow update
.github/workflows/clp-s-generated-code-checks.yaml
Updates step to call task codegen:clp-s-generate-parsers namespace.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant Validator as sql-parser/index.ts
  participant CharStream as UpperCaseCharStream
  participant Lexer as SqlBaseLexer (ANTLR)
  participant Parser as SqlBaseParser (ANTLR)
  participant Listener as SyntaxErrorListener

  Caller->>Validator: validate(sqlString)
  Validator->>CharStream: wrap input (uppercasing)
  Validator->>Lexer: new Lexer(CharStream)
  Validator->>Listener: attach to Lexer and Parser
  Validator->>Parser: new Parser(TokenStream(Lexer))
  Parser->>Parser: singleStatement()
  Parser-->>Listener: reportError? (on syntax issue)
  alt syntax error
    Listener-->>Validator: throw SyntaxError(line:col msg)
    Validator-->>Caller: propagate SyntaxError
  else valid
    Parser-->>Validator: complete
    Validator-->>Caller: return (void)
  end
Loading
sequenceDiagram
  autonumber
  actor Dev/CI
  participant Task as task webui-generate-parser
  participant ANTLR as java -jar antlr.jar
  participant FS as Filesystem
  participant Utils as checksum/replace-text

  Dev/CI->>Task: invoke
  Task->>Utils: checksum validate (OUTPUT_DIR)
  Task->>FS: rm -rf generated
  Task->>ANTLR: generate TS from SqlBase.g4 (-no-listener)
  Task->>FS: prune non-*.ts
  Task->>Utils: prepend // @ts-nocheck to lexer/parser
  Task->>Utils: checksum compute
  Task-->>Dev/CI: done
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately reflects the main changes described in the PR: adding ANTLR tooling and a client-side SQL parser, moving clp-s parser generation into taskfiles/codegen.yaml, and removing an unused client package-lock.json. It is specific to the changeset and clearly tied to the summaries and objectives provided. While somewhat long, it remains a single clear sentence that a reviewer can quickly scan to understand the PR’s intent.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7c990f and cd1b6b6.

📒 Files selected for processing (1)
  • taskfiles/codegen.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-06-02T18:22:24.060Z
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Applied to files:

  • taskfiles/codegen.yaml
📚 Learning: 2025-05-27T20:06:21.263Z
Learnt from: anlowee
PR: y-scope/clp#925
File: components/core/src/clp_s/search/kql/antlr_generated/KqlParser.cpp:0-0
Timestamp: 2025-05-27T20:06:21.263Z
Learning: ANTLR-generated files under `antlr_generated/` directories should not be subject to coding style checks or manual review comments, as they are auto-generated code that would be overwritten on regeneration.

Applied to files:

  • taskfiles/codegen.yaml
📚 Learning: 2025-09-10T22:13:51.907Z
Learnt from: hoophalab
PR: y-scope/clp#1293
File: taskfiles/codegen.yaml:26-33
Timestamp: 2025-09-10T22:13:51.907Z
Learning: ANTLR automatically creates the output directory specified by the -o flag, so explicit directory creation with mkdir -p is not needed before running ANTLR commands.

Applied to files:

  • taskfiles/codegen.yaml
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.

Applied to files:

  • taskfiles/codegen.yaml
📚 Learning: 2025-05-28T18:33:30.155Z
Learnt from: anlowee
PR: y-scope/clp#925
File: taskfiles/deps/main.yaml:97-106
Timestamp: 2025-05-28T18:33:30.155Z
Learning: In the taskfiles dependency system (taskfiles/deps/main.yaml), echo commands are used to generate .cmake settings files that are consumed by the main CMake build process. These files set variables like ANTLR_RUNTIME_HEADER to point to dependency locations for use during compilation.

Applied to files:

  • taskfiles/codegen.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: lint-check (ubuntu-24.04)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread components/webui/package-lock.json
coderabbitai[bot]

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

@hoophalab hoophalab requested a review from davemarco September 9, 2025 05:21
coderabbitai[bot]

This comment was marked as outdated.

Comment thread components/webui/client/package.json Outdated
"start": "vite",
"guided": "VITE_GUIDED_DEV=true npm run start"
"guided": "VITE_GUIDED_DEV=true npm run start",
"generate-parser": "cd src/SqlParser && antlr4 -Dlanguage=TypeScript -o generated -no-listener -Xexact-output-dir SqlBase.g4 && find generated -type f -and -not -name '*.ts' -exec rm {} \\; && sed -i '1s/^/\\/\\/ @ts-nocheck\\n/' generated/*.ts"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should simplify this script. I think you can just ignore the typescript in the tsconfig file instead of injecting the ignore. Also why are we deleting these other files? Why not just leave them?

@davemarco davemarco Sep 9, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe we should add a short readme into the sql Parser folder reminding people to install pip3 install antlr4-tools. Since this is needed?

@davemarco davemarco Sep 9, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it may make sense to put this script in the readme, instead of package.json, since it isnt actually using the npm dependency. Its not like it will be run often

@hoophalab hoophalab Sep 9, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess "exclude": ["...."] won't work in tsconfig. Is any other config I'm not aware of?
Deleting unused files are what clps do for kql files, and it also has a similar script in the taskfile.

I guess a proper solution is to move this solution to taskfile?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the // @ts-nocheck, I think you can make a new tsconfig like tsconfig/tsconfig.generated.json and have it include "include": [
"../src/sql-parser/generated/**/*"
],

and have tsconfig/tsconfig.app.json exclude it. Then have the main tsconfig look like this

  "references": [
  {
    "path": "./tsconfig/tsconfig.app.json"
  },
  {
    "path": "./tsconfig/tsconfig.node.json"
  },
  {
    "path": "./tsconfig/tsconfig.generated.json"
  }
]
}

Comment thread components/webui/client/src/SqlParser/index.ts Outdated
Comment thread components/webui/client/src/SqlParser/index.ts
Comment thread components/webui/client/src/SqlParser/index.ts
Comment thread components/webui/client/eslint.config.mjs Outdated

@davemarco davemarco left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay setup looks great. Added my comments. I think we will need another PR after this to do individual field parsing.

Last thing is I think we need to copy the Presto NOTICE file into our repo per license. And optionally give atttribution in there.

Im not that familiar with the licensing stuff so maybe we ask kirk

@hoophalab hoophalab changed the title feat(webui): Add client-side SQL validation with ANTLR. feat(webui): Add client-side SQL parsing using ANTLR. Sep 9, 2025
@hoophalab hoophalab changed the title feat(webui): Add client-side SQL parsing using ANTLR. feat(webui): Add client-side SQL parsing using ANTLR; Move clp-s parser generation to codegen.yaml. Sep 9, 2025
coderabbitai[bot]

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

@hoophalab hoophalab requested a review from davemarco September 9, 2025 19:43
Comment thread NOTICES Outdated
Comment thread NOTICES Outdated
Comment thread NOTICES Outdated

@davemarco davemarco left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code gen looks good to me. @kirkrodrigues can you just double check the checksum stuff in the taskfile. I dont really know well how that works, and would prefer if someone familiar with the taskfiles checked.

@hoophalab
i added comments on the ts-ignore, as well as the uppercase.

Last thing - I think we should create an issue about adding workflow in the future like clp-s

@davemarco

Copy link
Copy Markdown
Contributor

title:
feat(webui): Set up ANTLR tooling for future client-side SQL parsing; Move clp-s parser generation to codegen.yaml; Remove unused client package-lock.json.

coderabbitai[bot]

This comment was marked as resolved.

@davemarco davemarco left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest changes look good to me

@hoophalab hoophalab changed the title feat(webui): Set up ANTLR tooling for future client-side SQL parsing; Move clp-s parser generation to codegen.yaml; Remove unused client package-lock.json. feat(webui): Set up ANTLR tooling for future client-side SQL parsing; Move clp-s parser generation to codegen.yaml; Remove unused client package-lock.json. Sep 11, 2025

@junhaoliao junhaoliao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR title is good

i reviewed the taskfiles and posted some stylistic comments; otherwise this is good to good

Comment thread taskfiles/codegen.yaml Outdated
Comment thread taskfiles/codegen.yaml Outdated
Comment thread taskfiles/codegen.yaml Outdated
Comment thread taskfiles/codegen.yaml Outdated
Comment thread taskfiles/codegen.yaml
hoophalab and others added 7 commits September 13, 2025 20:03
Co-authored-by: Junhao Liao <junhao@junhao.ca>
Co-authored-by: Junhao Liao <junhao@junhao.ca>
Co-authored-by: Junhao Liao <junhao@junhao.ca>
Co-authored-by: Junhao Liao <junhao@junhao.ca>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f008074 and b7c990f.

📒 Files selected for processing (1)
  • taskfile.yaml (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T18:33:30.155Z
Learnt from: anlowee
PR: y-scope/clp#925
File: taskfiles/deps/main.yaml:97-106
Timestamp: 2025-05-28T18:33:30.155Z
Learning: In the taskfiles dependency system (taskfiles/deps/main.yaml), echo commands are used to generate .cmake settings files that are consumed by the main CMake build process. These files set variables like ANTLR_RUNTIME_HEADER to point to dependency locations for use during compilation.

Applied to files:

  • taskfile.yaml
🔇 Additional comments (1)
taskfile.yaml (1)

7-7: Centralizing parser generation via codegen include — verified

clp-s-generate-parsers is present in taskfiles/codegen.yaml and not marked internal (no 'internal: true' found).

Comment thread taskfile.yaml
Comment thread taskfiles/codegen.yaml Outdated
junhaoliao
junhaoliao previously approved these changes Sep 15, 2025

@junhaoliao junhaoliao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's fix one nit then this is good to go

Co-authored-by: Junhao Liao <junhao@junhao.ca>
@hoophalab hoophalab merged commit bb89bfb into y-scope:main Sep 15, 2025
26 checks passed
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
… Move clp-s parser generation to `codegen.yaml`; Remove unused client `package-lock.json`. (y-scope#1293)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants