lightning: specify collation when parquet value to string datum#38391
lightning: specify collation when parquet value to string datum#38391ti-chi-bot merged 3 commits intopingcap:masterfrom
Conversation
|
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. DetailsReviewer can indicate their review by submitting an approval review. |
|
/run-integration-br-test |
|
/cc @lance6716 @D3Hunter |
| v = ts.Format(utcTimeLayout) | ||
| } | ||
| d.SetString(v, "") | ||
| d.SetString(v, "utf8mb4_bin") |
There was a problem hiding this comment.
There are many places need to consider string encodings, one is string data in parquet file, the other one is string variables in the memory of lightning process which read by parquet reader. Since golang string is always assumed utf8-encoded I think this PR is OK. But I'm not sure if parquet file has another encoding for string data and go-parquet reader wrongly cast it to golang string without encode/decode.
|
/merge |
|
This pull request has been accepted and is ready to merge. DetailsCommit hash: 62d9688 |
|
/merge |
|
In response to a cherrypick label: new pull request created: #38487. |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
In response to a cherrypick label: new pull request created: #38488. |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
In response to a cherrypick label: new pull request created: #38489. |
TiDB MergeCI notify🔴 Bad News! [1] CI still failing after this pr merged.
|
What problem does this PR solve?
Issue Number: close #38351
Problem Summary:
What is changed and how it works?
For parquet parser, when setting a value into the string datum, use the "utf8mb4_bin" collation instead of an empty collation. This will make the string conversion logic not report errors, thus improving the performance.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.