Skip to content

[GLUTEN-7028][CH][Part-14] Refactor Case Sensitive Support for MergeTree#8346

Merged
baibaichen merged 10 commits intoapache:mainfrom
baibaichen:feature/case-sensitve
Dec 27, 2024
Merged

[GLUTEN-7028][CH][Part-14] Refactor Case Sensitive Support for MergeTree#8346
baibaichen merged 10 commits intoapache:mainfrom
baibaichen:feature/case-sensitve

Conversation

@baibaichen
Copy link
Copy Markdown
Contributor

@baibaichen baibaichen commented Dec 25, 2024

What changes were proposed in this pull request?

problem

This PR refactor how clickhosue backend supports wrting merge tree. When spark.sql.caseSensitive is false,

Before this PRs, We convert column name to lower case , and save to disc, for example, given the following DDL

CREATE TABLE IF NOT EXISTS LINEITEM_MERGETREE_CASE_SENSITIVE
(
 L_ORDERKEY      bigint,
 L_PARTKEY       bigint,
 L_SUPPKEY       bigint,
 L_LINENUMBER    bigint,
 L_QUANTITY      double,
 L_EXTENDEDPRICE double,
 L_DISCOUNT      double,
 L_TAX           double,
 L_RETURNFLAG    string,
 L_LINESTATUS    string,
 L_SHIPDATE      date,
 L_COMMITDATE    date,
 L_RECEIPTDATE   date,
 L_SHIPINSTRUCT  string,
 L_SHIPMODE      string,
 L_COMMENT       string
)
USING clickhouse
PARTITIONED BY (L_SHIPDATE)
TBLPROPERTIES (orderByKey='L_DISCOUNT')
LOCATION '$basePath/LINEITEM_MERGETREE_CASE_SENSITIVE

Here are columns.txt

before after
columns format version: 1
16 columns:
l_orderkey Nullable(Int64)
l_partkey Nullable(Int64)
l_suppkey Nullable(Int64)
l_linenumber Nullable(Int64)
l_quantity Nullable(Float64)
l_extendedprice Nullable(Float64)
l_discount Nullable(Float64)
l_tax Nullable(Float64)
l_returnflag Nullable(String)
l_linestatus Nullable(String)
l_shipdate Nullable(Date32)
l_commitdate Nullable(Date32)
l_receiptdate Nullable(Date32)
l_shipinstruct Nullable(String)
l_shipmode Nullable(String)
l_comment Nullable(String)
columns format version: 1
16 columns:
L_ORDERKEY Nullable(Int64)
L_PARTKEY Nullable(Int64)
L_SUPPKEY Nullable(Int64)
L_LINENUMBER Nullable(Int64)
L_QUANTITY Nullable(Float64)
L_EXTENDEDPRICE Nullable(Float64)
L_DISCOUNT Nullable(Float64)
L_TAX Nullable(Float64)
L_RETURNFLAG Nullable(String)
L_LINESTATUS Nullable(String)
L_SHIPDATE Nullable(Date32)
L_COMMITDATE Nullable(Date32)
L_RECEIPTDATE Nullable(Date32)
L_SHIPINSTRUCT Nullable(String)
L_SHIPMODE Nullable(String)
L_COMMENT Nullable(String)

spark.sql.caseSensitive is a configuarion, once it is set to true,unless user force the column name to be lowercase
in the sql, subsequent operations will fail, even if the column name is capitalized as declared by the user.

change

I did two chagne:

  1. We don't convert user defined column name, but the index column name, for example, minmax, partition name, etc. will be normailized to user defined column name
  2. Add a rename transformer if needed when read from merge tree.

(Fixes: #7028)

How was this patch tested?

Reopen ignored UTs

@github-actions github-actions bot added CORE works for Gluten Core CLICKHOUSE labels Dec 25, 2024
@github-actions
Copy link
Copy Markdown

#7028

@baibaichen baibaichen force-pushed the feature/case-sensitve branch from 2771287 to 7809a3c Compare December 25, 2024 10:36
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@baibaichen baibaichen force-pushed the feature/case-sensitve branch from 7809a3c to 8cd914b Compare December 26, 2024 10:51
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Copy link
Copy Markdown
Contributor

@taiyang-li taiyang-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@baibaichen baibaichen merged commit 8a5d7fb into apache:main Dec 27, 2024
@baibaichen baibaichen deleted the feature/case-sensitve branch December 27, 2024 07:59
baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 27, 2024
baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 28, 2024
baibaichen pushed a commit that referenced this pull request Dec 30, 2024
* [Fix UT] spark.databricks.delta.stats.skipping -> false

* [Fix UT] Bucket table not support

* [Fix UT] 'test cache mergetree data no partition columns' already fixed by (#8346)

* [UT] open ignore test

* [MINOR REFACTOR] Pass by const reference instead of pass by value

* [MINOR REFACTOR] validatedPartitionID

* [Fix Bug] decode part name

* clang 19 fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLICKHOUSE CORE works for Gluten Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CH] Fully Support writing parquet and mergetree in spark 3.5.x with delta protocol

2 participants