Skip to content

import: relax memory check constraints#18248

Merged
ti-chi-bot[bot] merged 5 commits intotikv:masterfrom
3pointer:aggressive_memory_check
Feb 26, 2025
Merged

import: relax memory check constraints#18248
ti-chi-bot[bot] merged 5 commits intotikv:masterfrom
3pointer:aggressive_memory_check

Conversation

@3pointer
Copy link
Contributor

@3pointer 3pointer commented Feb 24, 2025

What is changed and how it works?

Issue Number: ref #18124

What's Changed:
The previously (#18192) implemented memory check was too conservative, leading to unnecessary rejections even when there was still available memory.

For example, on an 8GB node, the memory_usage_limit was calculated as 6GB, and applying the 0.9 threshold effectively resulted in a 5.4GB limit, which is only 67.5% of the total memory. This led to premature rejection of import requests, even when the system still had usable memory.

import: make memory check more aggressive.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • [] Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note

None

Signed-off-by: 3pointer <luancheng@pingcap.com>
Signed-off-by: 3pointer <luancheng@pingcap.com>
Signed-off-by: 3pointer <luancheng@pingcap.com>
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Feb 24, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 24, 2025
@3pointer 3pointer force-pushed the aggressive_memory_check branch from 8884a89 to a9b21c9 Compare February 24, 2025 10:17
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/needs-linked-issue labels Feb 24, 2025
@3pointer 3pointer marked this pull request as ready for review February 24, 2025 10:19
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 24, 2025
Signed-off-by: 3pointer <luancheng@pingcap.com>
Copy link
Contributor

@Tristan1900 Tristan1900 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change! just a couple comments from my side

SysQuota::memory_limit_in_bytes()
})();

if mem_limit == 0 || mem_limit < usage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems mem_limit could be -1 if unlimited when using cgroup. maybe can have better logging for that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mem_limit is u64, so I think it cannot be -1

Ok(()) => (),
Err(e) => {
// in case of immediate retry from client side
tokio::time::sleep(Duration::from_secs(1)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we adjust the backoff strategy on the client side instead of injecting delay on the server side?
holding a bunch of calls in memory for 1 sec could potentially increase memory pressure,
also sleep for 1s possibly will not solve the burst issue but only delay the burst for 1s, if we want to smooth out we need to add a jitter to the delay. I think jitter can already be configured at the backoff strategy on client side.

Copy link
Contributor Author

@3pointer 3pointer Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually client side has a backoff strategy. So I'm not sure that's the root cause. anyway I'll test it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyway, add a jitter is more solid, I'll add and test it

@v01dstar
Copy link
Member

I think the PR's title contradicts what this PR is actually doing. "Relax memory check constraints" may be a better name.

// Reject ONLY if BOTH:
// - Available memory is below REJECT_SERVE_MEMORY_USAGE
// - Memory usage ratio is 90%+
if mem_limit - usage < REJECT_SERVE_MEMORY_USAGE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I feel the following version is more readable.

    let free_memory = mem_limit - usage;
    let min_required_memory = std::cmp::min(
        REJECT_SERVE_MEMORY_USAGE,
        ((1.0 - HIGH_IMPORT_MEMORY_WATER_RATIO) * mem_limit as f64) as u64
    );

    if free_memory < min_required_memory {
        let usage_ratio = usage as f64 / mem_limit as f64;
        return Err(Error::ResourceNotEnough(format!(
            "Memory usage too high, usage: {} bytes, mem limit {} bytes",
            usage,
            mem_limit
        )));
    }

Signed-off-by: 3pointer <luancheng@pingcap.com>
Copy link
Contributor

@YuJuncen YuJuncen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not pretty sure whether delay in the server is a good idea... Other LGTM

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Feb 26, 2025
@3pointer 3pointer changed the title import: make memory check more aggressive. import: relax memory check constraints Feb 26, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Feb 26, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Leavrth, YuJuncen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Feb 26, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Feb 26, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-02-26 07:05:37.412773494 +0000 UTC m=+425885.365931745: ☑️ agreed by YuJuncen.
  • 2025-02-26 09:04:46.01383 +0000 UTC m=+433033.966988264: ☑️ agreed by Leavrth.

@ti-chi-bot ti-chi-bot bot merged commit 71aecc2 into tikv:master Feb 26, 2025
8 checks passed
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone Feb 26, 2025
3pointer added a commit to 3pointer/tikv that referenced this pull request Mar 20, 2025
@YuJuncen YuJuncen added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Dec 9, 2025
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this pull request Dec 9, 2025
ref tikv#18124

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #19189.
But this PR has conflicts, please resolve them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants