Skip to content

Eliminate the table lock between reading,writing and DDL operators for TiFlash#1736

Merged
JaySon-Huang merged 34 commits intopingcap:masterfrom
JaySon-Huang:eliminate_table_lock
Apr 30, 2021
Merged

Eliminate the table lock between reading,writing and DDL operators for TiFlash#1736
JaySon-Huang merged 34 commits intopingcap:masterfrom
JaySon-Huang:eliminate_table_lock

Conversation

@JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Apr 7, 2021

What problem does this PR solve?

Issue Number: close #1162

Problem Summary: The table lock is held for the lifetime of the coprocessor reading. Reading, writing, DDL operations may block each other.
We need to take care of not writing codes that lead to deadlocks in some extreme situations. This PR refactor the storage lock model and make it less lock conflict between reading, writing, and DDL operations.

What is changed and how it works?

Proposal: https://docs.google.com/document/d/1LDo3MLU5H3SranepCiDIskbsAU7tCG3dB5Jml_dD7Es/edit#

What's Changed:

  1. Port the latest RWLock for IStorage from Clickhouse. "phase-fair rwlocks have been shown to provide more predictable timings compared to task-fair rwlocks for certain workloads (for relatively low share of writes, <25%). Brandenburg B, Anderson J, (2010)"
  2. Refactor the lock model for IStorage to eliminate the lock between reading, writing, and DDL operations.
  3. Acquire table lock by QueryID/threadName instead of function name
  4. Tombstone the database after receiving a "drop" DDL from TiDB. Physically drop the databases / tables after gc safe point.
  5. Remove some outdated tests under tests/mutable-test/txn_schema (all those tests are ported into tests/delta-merge-test/raft/schema)

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch: release-5.0

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • N/A

Release note

  • Eliminate the locks between reading and DDL jobs to reduce jitters

@JaySon-Huang JaySon-Huang self-assigned this Apr 7, 2021
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang JaySon-Huang force-pushed the eliminate_table_lock branch from 93e8eb0 to 18ff92b Compare April 7, 2021 13:52
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

1 similar comment
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang JaySon-Huang force-pushed the eliminate_table_lock branch from 14ca60e to be77b03 Compare April 16, 2021 07:11
@JaySon-Huang JaySon-Huang changed the title [WIP] Eliminate the table lock between reading,writing and DDL operators for TiFlash Eliminate the table lock between reading,writing and DDL operators for TiFlash Apr 16, 2021
@JaySon-Huang
Copy link
Contributor Author

@zanmato1984 @flowbehappy PTAL

@JaySon-Huang JaySon-Huang added needs-cherry-pick-release-5.0 PR which needs to be cherry-picked to release-5.0 type/enhancement The issue or PR belongs to an enhancement. labels Apr 20, 2021
@flowbehappy flowbehappy self-requested a review April 23, 2021 07:51
Copy link
Contributor

@flowbehappy flowbehappy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Apr 23, 2021
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
@zanmato1984
Copy link
Contributor

Other than the latest comment of mine, this change LGTM.

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Apr 30, 2021
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Apr 30, 2021
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@ti-srebot
Copy link
Collaborator

cherry pick to release-5.0 in PR #1858

JaySon-Huang added a commit to JaySon-Huang/tiflash that referenced this pull request May 23, 2021
…r TiFlash (pingcap#1736)

1. Port the latest `RWLock` for `IStorage` from Clickhouse. "phase-fair rwlocks have been shown to provide more predictable timings compared to task-fair rwlocks for certain workloads (for relatively low share of writes, <25%). Brandenburg B, Anderson J, (2010)"
2. Refactor the lock model for `IStorage` to eliminate the lock between reading, writing, and DDL operations.
3. Acquire table lock by QueryID/threadName instead of function name
3. Tombstone the database after receiving a "drop" DDL from TiDB. Physically drop the databases / tables after gc safe point.
3. Remove some outdated tests under `tests/mutable-test/txn_schema` (all those tests are ported into `tests/delta-merge-test/raft/schema`)

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>

 Conflicts:
	dbms/src/Databases/test/gtest_database.cpp
	dbms/src/Storages/GCManager.cpp
	dbms/src/Storages/Transaction/ApplySnapshot.cpp
	dbms/src/Storages/Transaction/PartitionStreams.cpp
zanmato1984 pushed a commit that referenced this pull request May 23, 2021
…r TiFlash (for POC) (#1973)

* Remove ReplicatedMergeTree family (#1805)

* Eliminate the table lock between reading,writing and DDL operators for TiFlash (#1736)

1. Port the latest `RWLock` for `IStorage` from Clickhouse. "phase-fair rwlocks have been shown to provide more predictable timings compared to task-fair rwlocks for certain workloads (for relatively low share of writes, <25%). Brandenburg B, Anderson J, (2010)"
2. Refactor the lock model for `IStorage` to eliminate the lock between reading, writing, and DDL operations.
3. Acquire table lock by QueryID/threadName instead of function name
3. Tombstone the database after receiving a "drop" DDL from TiDB. Physically drop the databases / tables after gc safe point.
3. Remove some outdated tests under `tests/mutable-test/txn_schema` (all those tests are ported into `tests/delta-merge-test/raft/schema`)

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>

 Conflicts:
	dbms/src/Databases/test/gtest_database.cpp
	dbms/src/Storages/GCManager.cpp
	dbms/src/Storages/Transaction/ApplySnapshot.cpp
	dbms/src/Storages/Transaction/PartitionStreams.cpp

* Fix bug for remote reading using "double lock" strategy (#1926)

 Conflicts:
	dbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp

* Fix the bug that some decoding is not protected by alter_lock

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>

 Conflicts:
	dbms/src/Storages/Transaction/ApplySnapshot.cpp
	dbms/src/Storages/Transaction/PartitionStreams.cpp
JaySon-Huang pushed a commit that referenced this pull request May 26, 2021
…r TiFlash (#1736) (#1858)

* Eliminate the table lock between reading,writing and DDL operators for TiFlash (#1736)

1. Port the latest `RWLock` for `IStorage` from Clickhouse. "phase-fair rwlocks have been shown to provide more predictable timings compared to task-fair rwlocks for certain workloads (for relatively low share of writes, <25%). Brandenburg B, Anderson J, (2010)"
2. Refactor the lock model for `IStorage` to eliminate the lock between reading, writing, and DDL operations.
3. Acquire table lock by QueryID/threadName instead of function name
4. Tombstone the database after receiving a "drop" DDL from TiDB. Physically drop the databases / tables after gc safe point.
5. Remove some outdated tests under `tests/mutable-test/txn_schema` (all those tests are ported into `tests/delta-merge-test/raft/schema`)

* Mark FIXME for conflict codes
Conflicts:
  dbms/src/Flash/Coprocessor/DAGQueryBlockInterpreter.cpp

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
JaySon-Huang pushed a commit to JaySon-Huang/tiflash that referenced this pull request Jun 8, 2021
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

Co-authored-by: Flowyi <flowbehappy@gmail.com>

cherry pick pingcap#1821 to release-5.0 (pingcap#1827)

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

Co-authored-by: xufei <xufei@pingcap.com>

Remove ReplicatedMergeTree family (pingcap#1805) (pingcap#1826)

cherry pick pingcap#1835 to release-5.0 (pingcap#1840)

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

Co-authored-by: xufei <xufei@pingcap.com>

Use file path as cache key (pingcap#1852) (pingcap#1862)

Lazily initializing DeltaMergeStore  (pingcap#1423) (pingcap#1751) (pingcap#1868)

Add background gc check thread for DeltaTree storage (pingcap#1742) (pingcap#1828)

* cherry pick pingcap#1742 to release-5.0

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

* fix conflict

Co-authored-by: lidezhu <47731263+lidezhu@users.noreply.github.com>

Fix calculate stable property (pingcap#1839) (pingcap#1878)

* cherry pick pingcap#1839 to release-5.0

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

* fix compile and cherry pick a small fix

Co-authored-by: lidezhu <47731263+lidezhu@users.noreply.github.com>

Fix client-c mistake setting current_ts to 0 when resolving lock for async commit (pingcap#1856) (pingcap#1870)

support constant folding in dbg coprocessor (pingcap#1881) (pingcap#1884)

Optimize & Reduce time cost about ci test (pingcap#1849) (pingcap#1894)

* reduce time cost about ci test by parallel
* add `-DNO_WERROR=ON` to cmake config for release-darwin build
* Fix tidb_ghpr_tics_test fail (pingcap#1895)

Signed-off-by: Zhigao Tong <tongzhigao@pingcap.com>

Fix panic with feature `compaction filter` in release-5.0 (pingcap#1891)

Init store in background task. (pingcap#1843,pingcap#1896) (pingcap#1874)

Push down filter on timestamp type column to storage layer (pingcap#1875) (pingcap#1906)

* cherry pick pingcap#1875 to release-5.0

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

* remove irrevalant code

Co-authored-by: lidezhu <47731263+lidezhu@users.noreply.github.com>

Fix ExchangeSender: remove duplicated write stream operation (pingcap#1379) (pingcap#1883)

cherry pick pingcap#1917 to release-5.0 (pingcap#1923)

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

Co-authored-by: Fu Zhe <fuzhe1989@gmail.com>

Pre-handle SSTFiles to DTFiles when applying snapshots (pingcap#1439) (pingcap#1867)

* Add `raft.snapshot.method` in configuration file to control the method of applying snapshots
  - "block" decode snapshot data as a block
  - "file1" decode snapshot data as DTFiles (directory mode) [**By default**]
* Add a stream `SSTFilesToBlockInputStream` to decode snapshot data into blocks
* Add a stream `BoundedSSTFilesToBlockInputStream` to bound the blocks read from SSTFilesToBlockInputStream by column `_tidb_rowid`
* Add a stream `SSTFilesToDTFilesOutputStream` that accept `BoundedSSTFilesToBlockInputStream` as input stream to write blocks into DTFiles
* Make `STORAGE_FORMAT_CURRENT` to be `STORAGE_FORMAT_V2` by default to support ingest DTFile into DT engine
* Fix the bug that the block decoded from SSTFile is not sorted by primary key and version (pingcap#1888)
* Fix panic with feature compaction filter with DTFile
* Fix ingest write amplification after snapshot apply optimization (pingcap#1913)

Co-authored-by: Zhigao Tong <tongzhigao@pingcap.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>

Cast int/real as real (pingcap#1911) (pingcap#1928)

Fix the problem that old dmfile is not removed atomically (pingcap#1918) (pingcap#1925)

Eliminate the table lock between reading,writing and DDL operators for TiFlash (pingcap#1736) (pingcap#1858)

* Eliminate the table lock between reading,writing and DDL operators for TiFlash (pingcap#1736)

1. Port the latest `RWLock` for `IStorage` from Clickhouse. "phase-fair rwlocks have been shown to provide more predictable timings compared to task-fair rwlocks for certain workloads (for relatively low share of writes, <25%). Brandenburg B, Anderson J, (2010)"
2. Refactor the lock model for `IStorage` to eliminate the lock between reading, writing, and DDL operations.
3. Acquire table lock by QueryID/threadName instead of function name
4. Tombstone the database after receiving a "drop" DDL from TiDB. Physically drop the databases / tables after gc safe point.
5. Remove some outdated tests under `tests/mutable-test/txn_schema` (all those tests are ported into `tests/delta-merge-test/raft/schema`)

* Mark FIXME for conflict codes
Conflicts:
  dbms/src/Flash/Coprocessor/DAGQueryBlockInterpreter.cpp

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>

cached tics code incase of clone same code repeatedly (pingcap#1994) (pingcap#2001)

Add row & bytes threshold config for CompactLog (pingcap#1997) (pingcap#2005)

* cherry pick pingcap#1997 to release-5.0

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
Co-authored-by: Zhigao Tong <tongzhigao@pingcap.com>

Fix the bug that incomplete write batches are not truncated (pingcap#1934) (pingcap#2003)

Revert "Push down filter on timestamp type column to storage layer (pingcap#1875) (pingcap#1906)" (pingcap#2010)

This reverts commit 630b612.

Revert Lazily Init Store (pingcap#2011)

* Revert "Init store in background task. (pingcap#1843,pingcap#1896) (pingcap#1874)"

This reverts commit 0882461.

Conflicts:
	dbms/src/Storages/StorageDeltaMerge.cpp

* format code

* Revert "Lazily initializing DeltaMergeStore  (pingcap#1423) (pingcap#1751) (pingcap#1868)"

This reverts commit bbce050.

Conflicts:
	dbms/src/Storages/DeltaMerge/tests/gtest_dm_storage_delta_merge.cpp
	dbms/src/Storages/StorageDeltaMerge.cpp
	dbms/src/Storages/StorageDeltaMerge.h

Co-authored-by: JinheLin <linjinhe@pingcap.com>
Co-authored-by: Flowyi <flowbehappy@gmail.com>

Revert "background gc thread" (pingcap#2015)

* Revert "Fix calculate stable property (pingcap#1839) (pingcap#1878)"

This reverts commit 6a9eb58.

* Revert "Add background gc check thread for DeltaTree storage (pingcap#1742) (pingcap#1828)"

This reverts commit cbbbd09.

* remove code

* format code

* format code

* fix test compile

* fix comment

Fix the potential concurrency problem when clone the shared delta index (pingcap#2030) (pingcap#2033)

* cherry pick pingcap#2030 to release-5.0

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

* Update dbms/src/Storages/DeltaMerge/DeltaIndex.h

Co-authored-by: JaySon <jayson.hjs@gmail.com>

Co-authored-by: lidezhu <47731263+lidezhu@users.noreply.github.com>
Co-authored-by: JaySon <jayson.hjs@gmail.com>

Fix bug for unexpected deleted ingest file (pingcap#2047) (pingcap#2048)

* Only enable GC for DTFiles after they get applied to all segments.
* Refine some loggings

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>

Fix cast int as time (pingcap#1788) (pingcap#1893)

cherry pick pingcap#1903 to release-5.0 (pingcap#1915)

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

Fix ingesting file may add ref page to deleted page (pingcap#2054) (pingcap#2055)

1. Put the ingest file id into `storage_pool.data` before ingesting them into segments
2. Add ref pages to the ingest files for each segment
3. Delete the original page id after all

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>

Fix compile error on Mac Clang 12.0.5 (pingcap#2058) (pingcap#2070)

cherry pick pingcap#2017 pingcap#2020 pingcap#2051 to release-5.0 (pingcap#2074)

Revert "cherry pick pingcap#2017 pingcap#2020 pingcap#2051 to release-5.0 (pingcap#2074)"

This reverts commit 8e0712c.

Revert "Fix compile error on Mac Clang 12.0.5 (pingcap#2058) (pingcap#2070)"

This reverts commit 3c31b92.

Revert "Fix ingesting file may add ref page to deleted page (pingcap#2054) (pingcap#2055)"

This reverts commit c773d61.

Revert "Fix bug for unexpected deleted ingest file (pingcap#2047) (pingcap#2048)"

This reverts commit b329618.

Revert "Pre-handle SSTFiles to DTFiles when applying snapshots (pingcap#1439) (pingcap#1867)"

This reverts commit c947fd6.

 Conflicts:
	dbms/src/Common/FailPoint.cpp
	dbms/src/Storages/DeltaMerge/File/DMFileBlockOutputStream.h
	dbms/src/Storages/DeltaMerge/File/DMFileWriter.h
	dbms/src/Storages/DeltaMerge/ReorganizeBlockInputStream.h
	dbms/src/Storages/DeltaMerge/SSTFilesToBlockInputStream.cpp
	dbms/src/Storages/DeltaMerge/SSTFilesToBlockInputStream.h
	dbms/src/Storages/DeltaMerge/SSTFilesToDTFilesOutputStream.cpp
	dbms/src/Storages/DeltaMerge/Segment.cpp
	dbms/src/Storages/DeltaMerge/StableValueSpace.cpp
	dbms/src/Storages/DeltaMerge/tests/gtest_dm_file.cpp
	dbms/src/Storages/DeltaMerge/tests/gtest_dm_storage_delta_merge.cpp
	dbms/src/Storages/StorageDeltaMerge.cpp
	dbms/src/Storages/StorageDeltaMerge.h
	dbms/src/Storages/Transaction/ApplySnapshot.cpp
	dbms/src/Storages/Transaction/PartitionStreams.cpp
	tests/delta-merge-test/raft/schema/drop_on_restart.test

Fix comment

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
jiangyuzhao pushed a commit to JaySon-Huang/tiflash that referenced this pull request Jun 21, 2021
…r TiFlash (pingcap#1736)

1. Port the latest `RWLock` for `IStorage` from Clickhouse. "phase-fair rwlocks have been shown to provide more predictable timings compared to task-fair rwlocks for certain workloads (for relatively low share of writes, <25%). Brandenburg B, Anderson J, (2010)"
2. Refactor the lock model for `IStorage` to eliminate the lock between reading, writing, and DDL operations.
3. Acquire table lock by QueryID/threadName instead of function name
3. Tombstone the database after receiving a "drop" DDL from TiDB. Physically drop the databases / tables after gc safe point.
3. Remove some outdated tests under `tests/mutable-test/txn_schema` (all those tests are ported into `tests/delta-merge-test/raft/schema`)

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>

 Conflicts:
	dbms/src/Databases/test/gtest_database.cpp
	dbms/src/Storages/GCManager.cpp
	dbms/src/Storages/Transaction/ApplySnapshot.cpp
	dbms/src/Storages/Transaction/PartitionStreams.cpp
@JaySon-Huang JaySon-Huang mentioned this pull request May 11, 2022
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-cherry-pick-release-5.0 PR which needs to be cherry-picked to release-5.0 status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eliminate the table lock between reading, writing and DDL operators

4 participants