Feat!: Separate SQL dialect from data source#2600
Conversation
for more information, see https://pre-commit.ci
|
Niels-b
left a comment
There was a problem hiding this comment.
Looks good. Thanks for doing these changes!
|
|
||
|
|
||
| class SqlServerSqlDialect(SqlDialect): | ||
| class SqlServerSqlDialect(SqlDialect, sqlglot_dialect="tsql"): |
There was a problem hiding this comment.
Why the name tsql? Transact-sql?
NIT: write it out fully
There was a problem hiding this comment.
tsql is the name for dialect SQL Server uses. Not a name I chose, this is what Microsoft named it and how sqlglot also refers to it. This sqlglot_dialect field ends up being passed to sqlglot in sql_utils.apply_sampling_to_sql. We basically had a bug in passing our own naming to sqlglot, but it hasn't been a problem so far as the only database where we used it is snowflake. The names happen to match there, but they don't for other variants.
Proof:
>>> import sqlglot
>>> sqlglot.parse_one("SELECT 1 FROM my_table", read="tsql")
Select(
expressions=[
Literal(this=1, is_string=False)],
from_=From(
this=Table(
this=Identifier(this=my_table, quoted=False))))
>>> sqlglot.parse_one("SELECT 1 FROM my_table", read="sqlserver")
Traceback (most recent call last):
File "<python-input-2>", line 1, in <module>
sqlglot.parse_one("SELECT 1 FROM my_table", read="sqlserver")
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mvds/soda/repos/soda-core/.venv/lib/python3.13/site-packages/sqlglot/__init__.py", line 135, in parse_one
dialect = Dialect.get_or_raise(read or dialect)
File "/home/mvds/soda/repos/soda-core/.venv/lib/python3.13/site-packages/sqlglot/dialects/dialect.py", line 1007, in get_or_raise
suggest_closest_match_and_fail("dialect", dialect_name, all_dialects)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mvds/soda/repos/soda-core/.venv/lib/python3.13/site-packages/sqlglot/helper.py", line 59, in suggest_closest_match_and_fail
raise ValueError(f"Unknown {kind} '{word}'.{similar}")
ValueError: Unknown dialect 'sqlserver'.



Description
This PR separates
SqlDialectfromDataSourceImplto facilitate testing of the SQL generation without needing access to an actual database.References to the data source within the
SqlDialecthave been replaced:sqlglotdialect is now defined at class scope and injected via a class construction argument in__init_subclass__Checklist
I verified this PR does not break soda-extensions.Warning
This PR breaks extensions due to decoupling the dialect from data source. Suggest releasing this as a minor version bump.
https://github.com/sodadata/soda-extensions/pull/279 updates soda-extensions to work with these changes