gc: add resolve locks interface for tidb gc_worker#945
gc: add resolve locks interface for tidb gc_worker#945MyonKeminta merged 8 commits intotikv:masterfrom
Conversation
ccb3015 to
abb612c
Compare
tikv/gc.go
Outdated
| lockResolver := &BaseLockResolver{store: s} | ||
| handler := func(ctx context.Context, r kv.KeyRange) (rangetask.TaskStat, error) { | ||
| return s.resolveLocksForRange(ctx, safePoint, r.StartKey, r.EndKey) | ||
| return ResolveLocksForRange(ctx, "GC", lockResolver, safePoint, r.StartKey, r.EndKey) |
There was a problem hiding this comment.
If the second parameter is named uuid, it doesn't make sense to pass "GC" to it. As there isn't a GCWorker here, consider generating one.
Actually I think we don't need to keep uuid anyway. For example, we can call it identifier, and pass it "gc-client-go-api" here, and pass "gcworker-{uuid}" from TiDB.
tikv/gc.go
Outdated
| } | ||
| } | ||
|
|
||
| func (l *BaseLockResolver) ResolveLocks(ctx context.Context, locks []*txnlock.Lock, loc *locate.KeyLocation) (*locate.KeyLocation, error) { |
There was a problem hiding this comment.
- The name
BaseLockResolverlooks confused with those stuff inlock_resovler.go. Consider distinguish them by naming thisBaseGCLockResolver. (and btw what doesBasemean here?) - The function
ResolveLockslooks also confused with those methods inlock_resolver.go, which is used by transactions that meet stale locks.BatchResolveLocks, which is used for GC, has some differences withResolveLocks. Here I suggest you to consider still using the nameBatchResolveLocks, or more clearly, using somethig likeResolveLocksForGC.
There was a problem hiding this comment.
For TiDB GC Worker. it use BaseLockResolver to scan and resolve locks.
For TiDB Advacner(log backup). it use BaseLockResolver to scan locks and override ResolveLocks to handle locks with a small region range and check txn status one by one.(which is not BatchResolveLocks, force delete locks before safepoint)
So, BaseGCLockResolver is ok for me. and BatchResolveLocks seems not clear.
tikv/gc.go
Outdated
|
|
||
| func (l *BaseLockResolver) ScanLocks(ctx context.Context, key []byte, maxVersion uint64) ([]*txnlock.Lock, *locate.KeyLocation, error) { | ||
| // every time ScanLocks need a new backoffer | ||
| l.bo = NewGcResolveLockMaxBackoffer(ctx) |
There was a problem hiding this comment.
How about providing the backoff from the caller?
I'm afraid it might easily cause misusing to making it a state of the lock resolver.
There was a problem hiding this comment.
the origin logic is ScanLocks and ResolveLocks must use backoff and must share one backoff, and gc worker test using failpoint to control the backoff.
So I think backoff is an internal logic for GCWorker and the test doesn't need it.
If we public it, then it should be caller's responsibility to call SetBackoffer and GetBackoffer when using ScanLocks and ResolveLocks for LockResolver? and caller may not know they need share the same backoff. Do you think it's still ok?
tikv/gc.go
Outdated
| continue | ||
| } | ||
| if len(locks) < gcScanLockLimit { | ||
| if len(locks) < GCScanLockLimit { |
There was a problem hiding this comment.
If this function is going to be used by TiDB, please also see if TiDB has its own settings of the scan limit. If so, consider making it customizable.
There was a problem hiding this comment.
TiDB has a mergeLockScanner which use gcScanLockLimit to scan locks.
There was a problem hiding this comment.
They both use the same setting 128. I think we can unified here
Signed-off-by: 3pointer <luancheng@pingcap.com>
c6121f8 to
e8dd618
Compare
Signed-off-by: 3pointer <luancheng@pingcap.com>
…ient-go into public_resolve_locks
tikv/gc.go
Outdated
| locks, loc, err := s.scanLocksInRegionWithStartKey(bo, key, safePoint, gcScanLockLimit) | ||
| // create new backoffer for every scan and resolve locks | ||
| bo := createBackoffFn(ctx) | ||
| locks, loc, err := resolver.ScanLocks(bo, key, maxVersion, scanLimit) |
There was a problem hiding this comment.
TiDB has this test code:
if w.testingKnobs.scanLocks != nil {
locks = append(locks, w.testingKnobs.scanLocks(key, loc.Region.GetID(), tryResolveLocksTS)...)
}This is used to inject some simulated locks for tests. Did you notice it, and do you have any idea how to support that kind of tests after the change?
This
There was a problem hiding this comment.
Yes,In TiDB test a new test struct will override ScanLock method to generate expected locks.
tikv/gc.go
Outdated
| } | ||
|
|
||
| func (l *BaseLockResolver) ResolveLocks(bo *Backoffer, locks []*txnlock.Lock, loc *locate.KeyLocation) (*locate.KeyLocation, error) { | ||
| return batchResolveLocksInARegion(bo, l.GetStore(), locks, loc) |
There was a problem hiding this comment.
The function batchResolveLocksInARegion only resolves locks in the single region specified by loc. If error happens, it may reload the region info, but still only handles the first region containing the key locks[0]. If locks locates in more than one region, the remaining will be ignored. This works in the GC logic (or the scanlocks-resolvelocks for range logic) because it guarantees locks passed to this function is sorted, and the returned KeyLocation indicates the next key to scan the next batch, so no lock will be missed. However if you public this function to be accessible everywhere, it's very weird that it doesn't promise to completely handle the given locks and returns a KeyLocation, and is very very very likely to cause misusing. Please consider adjust the implementation to suite the interface's semantics or any other way to avoid the problem.
2fdaed9 to
77caf52
Compare
Signed-off-by: 3pointer <luancheng@pingcap.com>
77caf52 to
b2f7321
Compare
|
/retest |
| if resolvedLocation == nil { | ||
| continue | ||
| } |
There was a problem hiding this comment.
Previously the backoffer is created outside the loop and recreated whenever the code in loop is successfully. Now you made the backoffer recreated every time in the loop, and as a result when this continue is executed, the backoffer is not counted as expected. Please consider keeping the old logic about backoff.
There was a problem hiding this comment.
fixed. good catch.
| // | ||
| // regionLocation should return if resolve locks succeed. if regionLocation return nil, | ||
| // which means not all locks are resolved in someway. the caller should retry scan locks. | ||
| ResolveLocksInOneRegion(bo *Backoffer, locks []*txnlock.Lock, regionLocation *locate.KeyLocation) (*locate.KeyLocation, error) |
There was a problem hiding this comment.
Please emphasize in comments that locks is assumed to be sorted.
Signed-off-by: 3pointer <luancheng@pingcap.com>
091467b to
fe522b5
Compare
* gc: add GCResolver inteface for resolve locks Signed-off-by: 3pointer <luancheng@pingcap.com> * adapt scanlimit Signed-off-by: 3pointer <luancheng@pingcap.com> * rename GCLockResolver to RegionLockResolver Signed-off-by: 3pointer <luancheng@pingcap.com> * update Signed-off-by: 3pointer <luancheng@pingcap.com> * address comments Signed-off-by: 3pointer <luancheng@pingcap.com> --------- Signed-off-by: 3pointer <luancheng@pingcap.com>
* gc: add GCResolver inteface for resolve locks Signed-off-by: 3pointer <luancheng@pingcap.com> * adapt scanlimit Signed-off-by: 3pointer <luancheng@pingcap.com> * rename GCLockResolver to RegionLockResolver Signed-off-by: 3pointer <luancheng@pingcap.com> * update Signed-off-by: 3pointer <luancheng@pingcap.com> * address comments Signed-off-by: 3pointer <luancheng@pingcap.com> --------- Signed-off-by: 3pointer <luancheng@pingcap.com>
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com> Co-authored-by: cfzjywxk <lsswxrxr@163.com> Co-authored-by: disksing <i@disksing.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: zzm <zhouzemin@pingcap.com> Co-authored-by: husharp <jinhao.hu@pingcap.com> Co-authored-by: you06 <you1474600@gmail.com> Co-authored-by: buffer <doufuxiaowangzi@gmail.com> Co-authored-by: 3pointer <qdlc2010@gmail.com> Co-authored-by: buffer <1045931706@qq.com> Co-authored-by: husharp <ihusharp@gmail.com> Co-authored-by: crazycs520 <crazycs520@gmail.com> Co-authored-by: Smilencer <smityz@qq.com> Co-authored-by: ShuNing <nolouch@gmail.com> Co-authored-by: zyguan <zhongyangguan@gmail.com> Co-authored-by: Jack Yu <jackysp@gmail.com> Co-authored-by: Weizhen Wang <wangweizhen@pingcap.com> Co-authored-by: lucasliang <nkcs_lykx@hotmail.com> Co-authored-by: healthwaite <148101100+healthwaite@users.noreply.github.com> Co-authored-by: xufei <xufeixw@mail.ustc.edu.cn> Co-authored-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ekexium <eke@fastmail.com> Co-authored-by: 山岚 <36239017+YuJuncen@users.noreply.github.com> Co-authored-by: glorv <glorvs@163.com> Co-authored-by: Yongbo Jiang <cabinfeveroier@gmail.com> resolve locks interface for tidb gc_worker (#945) fix some issues of replica selector (#910) (#942) fix some issues of replica selector (#910) fix issue of configure kv timeout not work when disable batch client (#980) fix batch-client wait too long and add some metrics (#973) fix batch-client wait too long and add some metrics (#973)" (#984) fix data race at the aggressiveLockingDirty (#913) fix MinSafeTS might be set to MaxUint64 permanently (#994) fix: fix invalid nil pointer when trying to record Store.SlownessStat. (#1017) Fix batch client batchSendLoop panic (#1021) fix request source tag unset (#1025) Fix comment of `SuspendTime` (#1057)
This PR is partial of pingcap/tidb#45904.
It tried to abstract a new inteface of gc lock resolver for GCWorker and log backup.