Skip to content

Fix bug for remote reading using "double lock" strategy#1926

Merged
ti-srebot merged 8 commits intopingcap:masterfrom
JaySon-Huang:fix_table_lock_remote
May 19, 2021
Merged

Fix bug for remote reading using "double lock" strategy#1926
ti-srebot merged 8 commits intopingcap:masterfrom
JaySon-Huang:fix_table_lock_remote

Conversation

@JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented May 18, 2021

Signed-off-by: JaySon-Huang jayson.hjs@gmail.com

What problem does this PR solve?

Fix some bugs brought by #1736 :

  • We should also protect building remote read requests by table_structure_lock.
  • If there is any error thrown in SchemaSyncService, we should let it try to do gc with the same gc_safe_point again.

What is changed and how it works?

  • Protect building remote read requests and null_stream_if_empty by table_structure_lock
  • Don't update SchemaSyncService::last_gc_safe_point if there is any error while doing GC so that we can try with the same gc_safe_point later.
  • Add DBGInvoke gc_schemas([gc_safe_point]) for debugging schema gc
  • Add some failpoints for mocking slow reading, slow writing, slow DDL operations

Related changes

  • Need to cherry-pick to the release branch: 5.0

Check List

Tests

Side effects

Release note

  • No release note

…ame safe point when there is any error

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
@JaySon-Huang JaySon-Huang added the type/bugfix This PR fixes a bug. label May 18, 2021
@JaySon-Huang JaySon-Huang requested a review from zanmato1984 May 18, 2021 12:34
@JaySon-Huang JaySon-Huang self-assigned this May 18, 2021
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

JaySon-Huang and others added 2 commits May 19, 2021 19:02
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
}
else
{
// Don't update last_gc_safe_point and retry later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that the next round GC will be more aggressive? Because the "current" timestamp is even further from the "last GC safe point"?

Copy link
Contributor Author

@JaySon-Huang JaySon-Huang May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will be more aggressive if the gc_safe_point succeeds.
In my simple manual test, the gc_safe_point in PD didn't get update unless I insert lots of rows. So I think we should allow running GC with the same gc_safe_point to reclaim disk space if it does not succeed.

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Copy link
Contributor

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label May 19, 2021
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
@JaySon-Huang
Copy link
Contributor Author

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label May 19, 2021
@ti-srebot
Copy link
Collaborator

/run-all-tests

@ti-srebot
Copy link
Collaborator

@JaySon-Huang merge failed.

@JaySon-Huang
Copy link
Contributor Author

/merge

@ti-srebot
Copy link
Collaborator

/run-all-tests

@ti-srebot ti-srebot merged commit 11ca3d7 into pingcap:master May 19, 2021
@JaySon-Huang JaySon-Huang deleted the fix_table_lock_remote branch May 19, 2021 15:58
JaySon-Huang added a commit to JaySon-Huang/tiflash that referenced this pull request May 23, 2021
 Conflicts:
	dbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp
zanmato1984 pushed a commit that referenced this pull request May 23, 2021
…r TiFlash (for POC) (#1973)

* Remove ReplicatedMergeTree family (#1805)

* Eliminate the table lock between reading,writing and DDL operators for TiFlash (#1736)

1. Port the latest `RWLock` for `IStorage` from Clickhouse. "phase-fair rwlocks have been shown to provide more predictable timings compared to task-fair rwlocks for certain workloads (for relatively low share of writes, <25%). Brandenburg B, Anderson J, (2010)"
2. Refactor the lock model for `IStorage` to eliminate the lock between reading, writing, and DDL operations.
3. Acquire table lock by QueryID/threadName instead of function name
3. Tombstone the database after receiving a "drop" DDL from TiDB. Physically drop the databases / tables after gc safe point.
3. Remove some outdated tests under `tests/mutable-test/txn_schema` (all those tests are ported into `tests/delta-merge-test/raft/schema`)

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>

 Conflicts:
	dbms/src/Databases/test/gtest_database.cpp
	dbms/src/Storages/GCManager.cpp
	dbms/src/Storages/Transaction/ApplySnapshot.cpp
	dbms/src/Storages/Transaction/PartitionStreams.cpp

* Fix bug for remote reading using "double lock" strategy (#1926)

 Conflicts:
	dbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp

* Fix the bug that some decoding is not protected by alter_lock

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>

 Conflicts:
	dbms/src/Storages/Transaction/ApplySnapshot.cpp
	dbms/src/Storages/Transaction/PartitionStreams.cpp
jiangyuzhao pushed a commit to JaySon-Huang/tiflash that referenced this pull request Jun 21, 2021
 Conflicts:
	dbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp
@JaySon-Huang
Copy link
Contributor Author

Have been merged into #1858 to release-5.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/can-merge Indicates a PR has been approved by a committer. status/LGT1 Indicates that a PR has LGTM 1. type/bugfix This PR fixes a bug.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants