Skip to content

Add time and thread_id for snapshot to check stale snapshots#2229

Merged
ti-srebot merged 12 commits intopingcap:masterfrom
JaySon-Huang:stale_ps_snapshot_logging
Jun 23, 2021
Merged

Add time and thread_id for snapshot to check stale snapshots#2229
ti-srebot merged 12 commits intopingcap:masterfrom
JaySon-Huang:stale_ps_snapshot_logging

Conversation

@JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Jun 21, 2021

What problem does this PR solve?

Issue Number: related to #2199

Problem Summary: Suspicious that some tasks hold snapshot for a long time, prevent PageStorage from GCing old PageFiles

What is changed and how it works?

Add create_time and thread_id when creating snapshots. After PageStorage::gc, log the oldest snapshot living time and its thread_id.
We can trace to thread_id to know whether the snapshot is created for reading / background task.
When the create_time and thread_id and the period of logging files, we can know exactly which task holds the snapshot for a long time.

Related changes

Check List

Tests

  • Manual test (add detailed scripts or steps below)
    • Deploy a TiFlash with these changes to check the logging

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM

Release note

  • No release note

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
This reverts commit 0a8060ae9eaa48765ca0e449f8b78eef3c98f9d9.
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
@JaySon-Huang JaySon-Huang self-assigned this Jun 21, 2021
@JaySon-Huang JaySon-Huang requested a review from flowbehappy June 21, 2021 14:03
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
@@ -0,0 +1,77 @@
#include <Common/CurrentMetrics.h>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split these function from SegmentReadTaskPool.h into SegmentReadTaskPool.cpp because if we include CurrentMetrics.h in SegmentReadTaskPool.h, it causes link error.
Split these functions def into cpp file and only include CurrentMetrics.h in the cpp file to fix that link error.

JaySon-Huang and others added 3 commits June 22, 2021 13:11
Revert "Revert "Add metrics for creating snapshot""

This reverts commit fca59a7.
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
@JaySon-Huang
Copy link
Contributor Author

@flowbehappy PTAL again

And also the changes in TiDB side: pingcap/tidb#25678 /cc @lidezhu

@flowbehappy flowbehappy self-requested a review June 23, 2021 02:39
Copy link
Contributor

@flowbehappy flowbehappy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Jun 23, 2021
@JaySon-Huang JaySon-Huang added needs-cherry-pick-release-4.0 PR which needs to be cherry-picked to release-4.0 needs-cherry-pick-release-5.0 PR which needs to be cherry-picked to release-5.0 needs-cherry-pick-release-5.1 PR which needs to be cherry-picked to release-5.1 labels Jun 23, 2021
@JaySon-Huang
Copy link
Contributor Author

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Jun 23, 2021
@ti-srebot
Copy link
Collaborator

/run-all-tests

@ti-srebot ti-srebot merged commit 8f8b729 into pingcap:master Jun 23, 2021
@JaySon-Huang JaySon-Huang deleted the stale_ps_snapshot_logging branch June 23, 2021 02:56
@ti-srebot
Copy link
Collaborator

cherry pick to release-4.0 in PR #2238

@ti-srebot
Copy link
Collaborator

cherry pick to release-5.0 in PR #2239

@ti-srebot
Copy link
Collaborator

cherry pick to release-5.1 in PR #2240

JaySon-Huang added a commit that referenced this pull request Jun 23, 2021
…2239)

* cherry pick #2229 to release-5.0
* Add snapshot metrics for compact delta

Co-authored-by: JaySon-Huang <jayson.hjs@gmail.com>
JaySon-Huang added a commit to JaySon-Huang/tiflash that referenced this pull request Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-cherry-pick-release-4.0 PR which needs to be cherry-picked to release-4.0 needs-cherry-pick-release-5.0 PR which needs to be cherry-picked to release-5.0 needs-cherry-pick-release-5.1 PR which needs to be cherry-picked to release-5.1 status/can-merge Indicates a PR has been approved by a committer. status/LGT1 Indicates that a PR has LGTM 1.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants