feat: `describegpt` - major refactor #3143

jqnatividad · 2025-12-01T13:21:51Z

made data dictionary generation "neuro-symbolic"
more robust Polars SQL generation
more configurable parameters for dictionary and tags generation
more verbose JSON output

as LLMs sometime start comments in the middle of the line (which is actually OK for readability)

- also add SQL comment prefix to Attribution when generating SQL

…dictionary and tags

docs/nyc311-describegpt.json

Copilot

Pull request overview

This PR refactors the describegpt command to use a "neuro-symbolic" approach for data dictionary generation. The core improvement separates code-based dictionary generation (statistics, enumerations, examples) from LLM-generated content (labels and descriptions), making the system more robust and configurable.

Key Changes:

Introduced code-based dictionary generation that parses stats/frequency CSVs and generates structured entries
Added configurable parameters: --num-examples and --truncate-str for controlling dictionary output
Implemented more verbose JSON output with metadata fields (enum_threshold, num_examples, truncate_str)

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/cmd/sqlp.rs	Enhanced SQL comment regex to handle whitespace-prefixed comments
src/cmd/describegpt.rs	Major refactor implementing neuro-symbolic dictionary generation with new parsing functions and data structures
resources/describegpt_defaults.toml	Simplified dictionary prompt to only request labels/descriptions, updated Polars SQL guidance
docs/nyc311-describegpt.md	Updated example output showing new dictionary format with additional metadata columns
docs/nyc311-describegpt.json	Updated JSON output structure with new field format and attribution metadata

src/cmd/sqlp.rs

src/cmd/describegpt.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

jqnatividad added 6 commits December 1, 2025 07:25

feat: describegpt neuro-symbolic data dictionary

02370b0

refactor: describegpt Polars SQL guidance

591c249

refactor: sqlp allow inline SQL comments

5a282c0

as LLMs sometime start comments in the middle of the line (which is actually OK for readability)

feat: describegpt add --num-examples and --truncate-str options

967a54c

- also add SQL comment prefix to Attribution when generating SQL

feat: describegpt add additional metadata when generating JSON for …

05825a1

…dictionary and tags

docs: describegpt update examples

06ac70b

jqnatividad requested a review from Copilot December 1, 2025 13:21

Copilot started reviewing on behalf of jqnatividad December 1, 2025 13:22 View session

github-advanced-security bot found potential problems Dec 1, 2025

View reviewed changes

docs/nyc311-describegpt.json Dismissed Show dismissed Hide dismissed

docs/nyc311-describegpt.json Dismissed Show dismissed Hide dismissed

docs/nyc311-describegpt.json Dismissed Show dismissed Hide dismissed

Copilot finished reviewing on behalf of jqnatividad December 1, 2025 13:23

Copilot AI reviewed Dec 1, 2025

View reviewed changes

src/cmd/sqlp.rs Show resolved Hide resolved

src/cmd/describegpt.rs Show resolved Hide resolved

src/cmd/describegpt.rs Show resolved Hide resolved

src/cmd/describegpt.rs Outdated Show resolved Hide resolved

src/cmd/describegpt.rs Show resolved Hide resolved

Update src/cmd/describegpt.rs

6b4e166

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

jqnatividad merged commit 5d1eabe into master Dec 1, 2025
16 of 17 checks passed

jqnatividad deleted the 3124-describegpt-neurosymbolic-datadictionary branch December 1, 2025 15:00

jqnatividad mentioned this pull request Dec 1, 2025

describegpt: more robust, "neuro-symbolic" Data Dictionary creation #3124

Closed

BrewTestBot mentioned this pull request Dec 8, 2025

qsv 11.0.2 Homebrew/homebrew-core#257731

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: `describegpt` - major refactor #3143

feat: `describegpt` - major refactor #3143

Uh oh!

jqnatividad commented Dec 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: describegpt - major refactor #3143

feat: describegpt - major refactor #3143

Uh oh!

Conversation

jqnatividad commented Dec 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: `describegpt` - major refactor #3143

feat: `describegpt` - major refactor #3143