Skip to content

Storage: Fix DataTypePtr is not shared as expected#9939

Merged
ti-chi-bot[bot] merged 7 commits intopingcap:masterfrom
JaySon-Huang:fix_shared_data_type
Mar 6, 2025
Merged

Storage: Fix DataTypePtr is not shared as expected#9939
ti-chi-bot[bot] merged 7 commits intopingcap:masterfrom
JaySon-Huang:fix_shared_data_type

Conversation

@JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Mar 4, 2025

What problem does this PR solve?

Issue Number: close #9947

Problem Summary:

There are about 13,000 tables with 60 enum column in each table, turning into 780,000 DataTypeEnum instances. They takes about 4GB memory.

What is changed and how it works?

before_share_enum.svg.zip
after_share_enum.svg.zip

Storage: Fix DataTypePtr is not shared as expected
* Introduce a class `DataTypePtrCache` and manage the shared cache of `DataTypePtr` instances.
* Introduce `DataTypeFactory::getOrSet(const ASTPtr & ast)` and try to find the cache with data type name as "ast->range.first, ast->range.second"
logging: Turn the logging level of "updateTableColumnInfo" into debug because that could cause lots of logging when restarting tiflash 

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

Create 36,000 table with schema like

CREATE TABLE IF NOT EXISTS enum_tbl_000 ( field1 VARCHAR(255), field2 VARCHAR(255), field3 VARCHAR(255),  enum1 ENUM('T', 'F'), enum2 ENUM('T', 'F'), enum3 ENUM('T', 'F'), enum4 ENUM('T', 'F'), enum5 ENUM('T', 'F'), enum6 ENUM('T', 'F'), enum7 ENUM('T', 'F'), enum8 ENUM('T', 'F'), enum9 ENUM('T', 'F'), enum10 ENUM('T', 'F'), enum11 ENUM('T', 'F'), enum12 ENUM('T', 'F'), enum13 ENUM('T', 'F'), enum14 ENUM('T', 'F'), enum15 ENUM('T', 'F'), enum16 ENUM('T', 'F'), enum17 ENUM('T', 'F'), enum18 ENUM('T', 'F'), enum19 ENUM('T', 'F'),  enum20 ENUM('pending', 'running', 'finished', 'failed'), enum21 ENUM('DEFAULT', 'ALL', 'PREDICATE', 'LIST'), enum22 ENUM('NONE', 'READ', 'INTEND', 'WRITE') ) DEFAULT CHARSET=utf8mb4;

Check the memory consumption without/with this fix. From 13:40 to 13:45, without this fix, tiflash consume 15GB memory. From 13:45 to 13:55, with this fix, tiflash consume 8GB memory.
image

  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Fix the issue that TiFlash may consume lots of memory when there are many `ENUM` columns on TiDB

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 4, 2025
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
@JaySon-Huang JaySon-Huang force-pushed the fix_shared_data_type branch from f183ef0 to dcf5f32 Compare March 5, 2025 13:30
@JaySon-Huang JaySon-Huang changed the title [WIP] Storage: Fix DataTypePtr is not shared as expected Storage: Fix DataTypePtr is not shared as expected Mar 5, 2025
@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels Mar 5, 2025
Signed-off-by: JaySon-Huang <tshent@qq.com>
@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Mar 6, 2025
Copy link
Member

@CalvinNeo CalvinNeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 6, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JinheLin, Lloyd-Pottiger

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [JinheLin,Lloyd-Pottiger]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 6, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 6, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-03-06 02:37:58.64147587 +0000 UTC m=+495591.770395612: ☑️ agreed by Lloyd-Pottiger.
  • 2025-03-06 07:59:26.07805223 +0000 UTC m=+514879.206971988: ☑️ agreed by JinheLin.

@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Mar 6, 2025
@ti-chi-bot ti-chi-bot bot merged commit 24d3106 into pingcap:master Mar 6, 2025
5 checks passed
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #9952.

@JaySon-Huang JaySon-Huang deleted the fix_shared_data_type branch March 6, 2025 09:04
@ti-chi-bot ti-chi-bot bot removed the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Mar 27, 2025
ti-chi-bot bot added a commit that referenced this pull request Mar 28, 2025
close #9947

Storage: Fix DataTypePtr is not shared as expected
* Introduce a class `DataTypePtrCache` and manage the shared cache of `DataTypePtr` instances.
* Introduce `DataTypeFactory::getOrSet(const ASTPtr & ast)` and try to find the cache with data type name as "ast->range.first, ast->range.second"
logging: Turn the logging level of "updateTableColumnInfo" into debug because that could cause lots of logging when restarting tiflash

Signed-off-by: JaySon-Huang <tshent@qq.com>

Co-authored-by: JaySon-Huang <tshent@qq.com>
Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
JaySon-Huang added a commit to ti-chi-bot/tiflash that referenced this pull request Mar 31, 2025
…ngcap#9952)

close pingcap#9947

Storage: Fix DataTypePtr is not shared as expected
* Introduce a class `DataTypePtrCache` and manage the shared cache of `DataTypePtr` instances.
* Introduce `DataTypeFactory::getOrSet(const ASTPtr & ast)` and try to find the cache with data type name as "ast->range.first, ast->range.second"
logging: Turn the logging level of "updateTableColumnInfo" into debug because that could cause lots of logging when restarting tiflash

Signed-off-by: JaySon-Huang <tshent@qq.com>

Co-authored-by: JaySon-Huang <tshent@qq.com>
Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
@ti-chi-bot ti-chi-bot bot added needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. labels Jul 14, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #10305.
But this PR has conflicts, please resolve them!

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Jul 14, 2025
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #10306.
But this PR has conflicts, please resolve them!

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Jul 14, 2025
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot pushed a commit that referenced this pull request Jul 15, 2025
close #9947

Storage: Fix DataTypePtr is not shared as expected
* Introduce a class `DataTypePtrCache` and manage the shared cache of `DataTypePtr` instances.
* Introduce `DataTypeFactory::getOrSet(const ASTPtr & ast)` and try to find the cache with data type name as "ast->range.first, ast->range.second"
logging: Turn the logging level of "updateTableColumnInfo" into debug because that could cause lots of logging when restarting tiflash

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: JaySon-Huang <tshent@qq.com>

Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: JaySon-Huang <tshent@qq.com>
ti-chi-bot bot pushed a commit that referenced this pull request Jul 16, 2025
close #9947

Storage: Fix DataTypePtr is not shared as expected
* Introduce a class `DataTypePtrCache` and manage the shared cache of `DataTypePtr` instances.
* Introduce `DataTypeFactory::getOrSet(const ASTPtr & ast)` and try to find the cache with data type name as "ast->range.first, ast->range.second"
logging: Turn the logging level of "updateTableColumnInfo" into debug because that could cause lots of logging when restarting tiflash

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: JaySon-Huang <tshent@qq.com>

Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: JaySon-Huang <tshent@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TiFlash consume much memory than expected with large number of ENUM columns

5 participants