Skip to content

Report native ClickHouse data types and pupulate missing type information#478

Merged
slabko merged 5 commits intoClickHouse:unique-type-names-betafrom
slabko:unique-type-names
Apr 17, 2025
Merged

Report native ClickHouse data types and pupulate missing type information#478
slabko merged 5 commits intoClickHouse:unique-type-names-betafrom
slabko:unique-type-names

Conversation

@slabko
Copy link
Copy Markdown
Contributor

@slabko slabko commented Mar 7, 2025

MS Power BI and other applications were not able to recognize the types of uncommon SQL-world data types, such as UInt8 or Int64. It was unable to parse the data types from SQLColumns because the dataset returned by this function did not comply with the requirements. This is fixed by changing the SQLColumns dataset.

Additionally, SQLGetTypeInfo returned SQL types such as INT, SMALLINT, BIGINT, etc. Each of these types had to have two entries in the SQLGetTypeInfo result: one for signed and one for unsigned types. This duplication confused MS Power BI. The fix for this issue involved following the documentation and using database-native types such as Int32, UInt64, etc.

@slabko slabko changed the title [WIP]: Fix SQLGetTypeInfo output to match documentation Fix Numeric Data Type Loading in Power BI Mar 12, 2025
@slabko slabko marked this pull request as ready for review March 12, 2025 11:23
@mshustov mshustov requested a review from slvrtrn March 12, 2025 15:03
@mshustov
Copy link
Copy Markdown
Member

mshustov commented Mar 12, 2025

discussed offline: we need to test how the existing dashboards react to the data type mapping changes

@slabko slabko marked this pull request as draft March 12, 2025 21:32
@slabko
Copy link
Copy Markdown
Contributor Author

slabko commented Mar 12, 2025

Unfortunately, after testing, we noticed that this change breaks the MS Power BI Direct Query aggregation functions, such as sum. Power BI relies on conversion to Decimal to avoid integer overflow. However, with the new implementation of the SQLGetTypeInfo function, Power BI selects Float32 instead, which prevents overflow but results in very imprecise calculations.

slabko added 2 commits April 1, 2025 13:52
The RowBinary format is not fully supported in the ODBC driver, and
maintaining feature parity while advancing and improving our ODBCDriver2
format gets more and more difficult. For now, we stop any development of
the RowBinary format, as it has never been officially supported, and
focus solely on the more advanced ODBCDriver2 format.
Update SQLGetTypeInfo results to comply with
[SQLGetTypeInfo](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlgettypeinfo-function).
Applications must use the type names returned in the TYPE_NAME column of
the SQLGetTypeInfo result set in ALTER TABLE and CREATE TABLE statements
@slabko slabko marked this pull request as ready for review April 1, 2025 11:52
};

std::vector<TypeMappingTestEntry> types = {
{"Bool", "0", SQL_VARCHAR},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be SQL_BIT

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bools are currently represented as strings: Bool columns are represented as a string. This is a known issue and should definitely be fixed. However, this extends the scope of this PR a bit too much, so it will be addressed in a different one.

{"Int16", "0", SQL_SMALLINT},
{"UInt16", "0", SQL_SMALLINT},
{"Int32", "0", SQL_INTEGER},
{"UInt32", "0", SQL_BIGINT},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be SQL_INTEGER (although we used to support some Perl's weird behavior, that should no longer be a problem)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyodbc, and probably other libraries in other languages, binds SQL_INTEGER to int32_t producing -1 for the max value of UInt32: 4294967295. So UInt32 have to be SQL_BIGINT to avoid this problem. Not sure if this is even a bug.

{"String", "0", SQL_VARCHAR},
{"FixedString(1)", "'0'", SQL_VARCHAR},
{"Date", "0", SQL_TYPE_DATE},
{"Date32", "0", SQL_VARCHAR},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be SQL_TYPE_DATE

Copy link
Copy Markdown
Contributor Author

@slabko slabko Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, Date32, DateTime64 are missing in our Python integration tests

Copy link
Copy Markdown
Contributor Author

@slabko slabko Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The driver does not currently support Date32: Date32 in not supported. The missing test for Date32 will be added as part of the fix for this issue.

{"LowCardinality(String)", {
SQL_VARCHAR, "String", msz, 0, na, na, false, SQL_VARCHAR, na, msz }},
// TODO(slabko): Low cardinality nullable types do not detect nullability correctly
// Notice `null` is set to `false` here, but must be `true`
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create an issue in GitHub for that

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mshustov mshustov requested a review from Copilot April 14, 2025 09:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 4 out of 19 changed files in this pull request and generated no comments.

Files not reviewed (15)
  • driver/CMakeLists.txt: Language not supported
  • driver/api/impl/impl.cpp: Language not supported
  • driver/api/odbc.cpp: Language not supported
  • driver/api/sql_columns_resultset_mutator.cpp: Language not supported
  • driver/api/sql_columns_resultset_mutator.h: Language not supported
  • driver/environment.cpp: Language not supported
  • driver/result_set.cpp: Language not supported
  • driver/test/CMakeLists.txt: Language not supported
  • driver/test/result_set_reader.hpp: Language not supported
  • driver/test/statement_parameters_it.cpp: Language not supported
  • driver/test/utils_ut.cpp: Language not supported
  • driver/utils/sql_encoding.h: Language not supported
  • driver/utils/type_info.cpp: Language not supported
  • driver/utils/type_info.h: Language not supported
  • driver/utils/utils.h: Language not supported
Comments suppressed due to low confidence (4)

.github/workflows/macOS.yml:104

  • Removing 'ClickHouse DSN (ANSI, RBWNAT)' from the TEST_DSN_LIST may reduce overall test coverage. Please verify that this DSN configuration is no longer required in any test scenarios.
+        -DTEST_DSN_LIST="ClickHouse DSN (ANSI);ClickHouse DSN (Unicode)"

.github/workflows/macOS.yml:168

  • The removal of the DSN mapping for 'ClickHouse DSN (ANSI, RBWNAT)' should be double-checked to ensure no critical testing scenarios are lost.
    -        ClickHouse DSN (ANSI, RBWNAT) = ClickHouse ODBC Driver (ANSI)

.github/workflows/Windows.yml:81

  • Ensure that removing 'ClickHouse DSN (ANSI, RBWNAT)' from the TEST_DSN_LIST in Windows does not omit necessary test coverage for specific DSN configurations.
+        -DTEST_DSN_LIST="ClickHouse DSN (ANSI);ClickHouse DSN (Unicode)"

.github/workflows/Linux.yml:102

  • Verify that the removal of 'ClickHouse DSN (ANSI, RBWNAT)' from the TEST_DSN_LIST does not exclude important test cases for DSN connectivity on Linux.
+        -DTEST_DSN_LIST="ClickHouse DSN (ANSI);ClickHouse DSN (Unicode)"

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 15, 2025

CLA assistant check
All committers have signed the CLA.

slabko added 3 commits April 16, 2025 16:10
The driver previously bound SQL_TYPE_TIMESTMP exclusively to `DateTime`,
which meant it could not bind values with fractions of seconds.
Consequently, it was not possible to use precise timestamps in WHERE
clauses. This change updates the driver to bind SQL_TYPE_TIMESTMP to
`DateTime64`. This allows inserting values into both `DateTime` and
`DateTime64` columns and using precise values in WHERE clauses.
@slabko slabko changed the base branch from master to unique-type-names-beta April 16, 2025 18:37
@slabko slabko changed the title Fix Numeric Data Type Loading in Power BI Report native ClickHouse data types and pupulate missing type information Apr 17, 2025
@slabko slabko merged commit 96dbab6 into ClickHouse:unique-type-names-beta Apr 17, 2025
15 checks passed
@mshustov
Copy link
Copy Markdown
Member

/windsurf-review

@windsurf-bot
Copy link
Copy Markdown

windsurf-bot bot commented May 31, 2025

I ran into an unexpected issue while reviewing this PR. Please try again later.

@mshustov
Copy link
Copy Markdown
Member

mshustov commented Jun 2, 2025

/windsurf-review

@windsurf-bot
Copy link
Copy Markdown

windsurf-bot bot commented Jun 2, 2025

I ran into an unexpected issue while reviewing this PR. Please try again later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Int64 columns are loaded as binary columns in PowerBI

4 participants