Skip to content

Improve memory reuse efficiency and reduce page faults if using two-level hash tables#80245

Merged
Algunenano merged 4 commits intoClickHouse:masterfrom
jiebinn:updateJemalloc
May 27, 2025
Merged

Improve memory reuse efficiency and reduce page faults if using two-level hash tables#80245
Algunenano merged 4 commits intoClickHouse:masterfrom
jiebinn:updateJemalloc

Conversation

@jiebinn
Copy link
Copy Markdown
Contributor

@jiebinn jiebinn commented May 15, 2025

This patch will change the default lg_extent_max_active_fit from 6 to 8. It will enhance hot dirty memory reuse when using two-level hashtables.

Performance issue analysis:
We identified the performance issue in many of the 43 ClickBench queries with ClickHouse on the 2x240 vCPUs system, particularly high page faults. For a deeper investigation, let's consider Q35. Q35 exhibited around 40% __handle_mm_fault hotspot in cycles on the GNR 2x240 vCPUs platform. We discovered that the high page faults stem from MADV_DONTNEED with the perf event.
In Query 35, there are 256 memory reallocations (sub-hashtables) from 4KB to 16KB for each arena. Jemalloc recycles and coalesces many of these 256 16K memory blocks into a larger one when ClickHouse frees them with the bpftrace data. Subsequently, the large memory space in the dirty ecache cannot be reused for the next following 16K request, as the maximum allowed memory space before splitting is 16 * 64 KB. When a new memory request is made, Jemalloc locates an existing extent larger than the requested size in the dirty ecache and splits it to fit the actual requested size, maximizing memory reuse. There is a boundary for the requested extent size and the existing extent size, with a maximum ratio of 64 to minimize memory fragmentation. Consequently, Jemalloc would find possible memory space from the retained ecache (already MADV_DONTNEED and lacking physical pages) or through mmap, resulting in page faults and high RSS.

What does the jemalloc patch do:
The 256 pieces of 16K size dirty memory will coalesce into a memory block larger than 64*16K, preventing reuse when a new 16K request arrives. To maximize dirty ecache reuse, we can increase the maximum ratio of the existing extent size to the requested extent size from 64 to 256.

Ref:
jemalloc/jemalloc#2842

Result:
We have tested this patch with Clickbench Q35 on a system with 2 x 240 vCPUs. The results show significant performance gains (opt/base):

<style> </style>
Q35 Query per second VmRSS Total page fault (50 runs) cycles instructions IPC
Opt/Base 1.961 0.546 0.29 0.43 0.857 1.993

The geometric mean of all 43 queries shows more than a 10% performance improvement.

Refs: jemalloc/jemalloc#2842

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Improve memory reuse efficiency and reduce page faults when using the twolevel hashtables.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@jiebinn jiebinn changed the title Improve memory reuse efficiency and reduce page faults Improve memory reuse efficiency and reduce page faults by updating jemalloc May 15, 2025
@rschu1ze
Copy link
Copy Markdown
Member

Does this PR update to a development version of jemalloc?

@Algunenano Algunenano added the can be tested Allows running workflows for external contributors label May 15, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented May 15, 2025

Workflow [PR], commit [d7378af]

@clickhouse-gh clickhouse-gh bot added pr-performance Pull request with some performance improvements submodule changed At least one submodule changed in this PR. labels May 15, 2025
@jiebinn
Copy link
Copy Markdown
Contributor Author

jiebinn commented May 16, 2025

Does this PR update to a development version of jemalloc?

Hi @rschu1ze,

This PR will update to a dev version of jemalloc. There is a significant performance improvement when queries use two-level hash tables for aggregation. Each thread will have 256 sub-hash tables for each two-level hash table. For example, if the initial size of a sub-hash table is 4K (256 cells * 16 bytes/cell) and it needs to reallocate from 4K to 16K (1024 cells * 16 bytes/cell) based on historical cache data, there will be 256 reallocation requests (one for each sub-hash table) from 4K to 16K sent to jemalloc.

After the aggregations, many of the 256 16K memory pieces will merge into much larger blocks. However, jemalloc will not choose, split, and reuse from these large memory blocks if their size is significantly larger than the subsequent requests. Consequently, new requests will have to allocate from the already freed memory MADV_DONTNEED or through new mmap, rather than using the hot and dirty memory cache. This can cause 3X page faults, 2X VmRSS, and a 50% reduction in QPS.

To avoid these issues, if we don't update the jemalloc version, we might need to work around by passing variables in settings to jemalloc or changing the management of memory blocks in two-level hash tables. However, both approaches may introduce other concerns.

By the way, do we have any tests if we decide to update the submodule version? I also noticed that the latest jemalloc version ([jemalloc @ 41a859e]) in ClickHouse is on the dev branch.

@rschu1ze
Copy link
Copy Markdown
Member

@jiebinn First of all, thanks for the PR.

This PR will update to a dev version of jemalloc.

That's unfortunately a problem for ClickHouse, especially for something so fundamental as memory management. The jemalloc homepage says (not surprisingly) "dev: The dev branch tracks current development, and at any given time may not be production-ready.". Unfortunately, they seem to make stable releases only rarely.

However, this is also only my opinion, I can't tell how reliable dev versions of jemalloc are really.
Let me tag @antonio2368 for a second opinion (he is on a longer leave right now, so might post in 1-2 months only).

By the way, do we have any tests if we decide to update the submodule version? I also noticed that the latest jemalloc version ([jemalloc @ 41a859e]) in ClickHouse is on the dev branch, which is fine as jemalloc is a mature and stable allocator.

This is the exact version which ClickHouse is using:

41a859ef 2022-07-02 14:44:46 -0400 Jasmin Parent (HEAD) Remove duplicated words in documentation

It is only ten (or so) commits ahead of tag 5.3.0 (from 2022-05-06 11:28:25). I am actually not sure if this was a coincidence or if it means "dev versions are okay and we merely don't track dev HEAD closely".

@jiebinn
Copy link
Copy Markdown
Contributor Author

jiebinn commented May 16, 2025

@rschu1ze, thank you for the quick response. I will inquire with Jemalloc about their plans for releasing a new stable version soon. If they don't have one scheduled, would you consider temporarily cherry-picking the performance commit?

@rschu1ze
Copy link
Copy Markdown
Member

rschu1ze commented May 16, 2025

@jiebinn Yes, but only if it has no dependencies to previous commits on the development branch and if it is straightforward. We should avoid using ClickHouse as a canary for finding bugs in jemallocs dev branch.

A stable jemalloc release would be definitely be the preferred route.

@Algunenano
Copy link
Copy Markdown
Member

I think using dev branches it's ok, but it depends on the project policy. If their dev branch is considered stable and final (think of absl for example) then there is no issue. If their dev branch is a candidate, which they will cut at some point and then fix bugs before release, then it's probably not ok. In general (but it's my experience) jemalloc is usually rock solid, so I expect it to be on the first case.

OTOH, running any branch in CI and compare / detect issues is ok. Based on findings and results we can decide whether the risk is worth or not.

@jiebinn
Copy link
Copy Markdown
Contributor Author

jiebinn commented May 16, 2025

@rschu1ze , I agree with that if we cherry-pick one commit temporarily, the patch should be clear and simple, and in good quality.

@Algunenano , I believe jemalloc is quite solid. I agree with that we can decide whether to update to a dev branch based on our findings and CI or other test results.

I'll first ask jemalloc if there are any plans to release a new stable version soon, as that would be the best solution. If there are no plans, we might consider cherry-picking this commit (jemalloc/jemalloc#2842), using the dev branch, or implementing a workaround in ClickHouse.

@rschu1ze
Copy link
Copy Markdown
Member

Okay, tnx.

@jiebinn To let our CI test your PR, please fix the build issues ... once all builds are green, functional/perf/end-to-end tests will start.

But by the looks of things, it seems the build problems are in jemalloc itself. To fix, feel free to replace the official jemalloc submodule (.gitmodules in the ClickHouse repository) by a forked one which fixes the build.

Also, to fix the build, please open a build log (e.g. the one I linked above ^^), then grep for cmake . This will find something like this:

Run command: [   cmake --debug-trycompile -DCMAKE_VERBOSE_MAKEFILE=1 -LA -DCMAKE_BUILD_TYPE=None  -DENABLE_THINLTO=0 -DSANITIZE=          -DENABLE_CHECK_HEAVY_BUILDS=1 -DENABLE_CLICKHOUSE_SELF_EXTRACTING=1 -DCMAKE_C_COMPILER=clang-19 -DCMAKE_CXX_COMPILER=clang++-19 -DCOMPILER_CACHE=sccache -DENABLE_BUILD_PROFILING=1 -DENABLE_TESTS=0 -DENABLE_UTILS=0 -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse]

which you can run locally to reproduce (only remove trash like DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY and DENABLE_CLICKHOUSE_SELF_EXTRACTING)

@jiebinn jiebinn force-pushed the updateJemalloc branch 2 times, most recently from 2ce9e0b to 739e5b5 Compare May 21, 2025 07:09
@jiebinn
Copy link
Copy Markdown
Contributor Author

jiebinn commented May 21, 2025

Hi @rschu1ze and @Algunenano ,we can have a safer and more convenient method to address the high page faults issue when queries use two-level hash tables in ClickHouse. Previously, the default lg_extent_max_active_fit was set to 6, resulting in a maximum ratio of 64 between the size of a dirty merged memory extent and the size of the requested allocation. Since two-level hash tables in ClickHouse send 256 reallocation requests, the coalesced large memory block cannot be reused as it will usually exceed the max ratio of 64. To better fit our actual scenario in ClickHouse, we can increase this ratio to 256.
I have tested this PR with Clickbench on my local 2x240 vCPUs system. The Q35 query showed almost 2X performance gain, and page faults were reduced to 30% as before. Other performance metrics are similar to the table above.

@jiebinn jiebinn changed the title Improve memory reuse efficiency and reduce page faults by updating jemalloc Improve memory reuse efficiency and reduce page faults if using two-level hash tables May 21, 2025
@Algunenano
Copy link
Copy Markdown
Member

@jiebinn it seems you removed the update of the submodule in one of the last force pushes, so the PR is not doing anything except changing JEMALLOC_CONFIG_MALLOC_CONF

@jiebinn
Copy link
Copy Markdown
Contributor Author

jiebinn commented May 22, 2025

@jiebinn it seems you removed the update of the submodule in one of the last force pushes, so the PR is not doing anything except changing JEMALLOC_CONFIG_MALLOC_CONF

Yes. This PR will change the JEMALLOC_CONFIG_MALLOC_CONF. It offers a low risk and convenient method to revolve the high page faults issue resulting from the two-level hash tables.

@alexey-milovidov
Copy link
Copy Markdown
Member

@rschu1ze, it's ok to use the development branch.

Our policy - ignore everything that library developers say about their releases and apply ClickHouse CI to it.

@Algunenano
Copy link
Copy Markdown
Member

Yes. This PR will change the JEMALLOC_CONFIG_MALLOC_CONF. It offers a low risk and convenient method to revolve the high page faults issue resulting from the two-level hash tables.

I don't quite understand sorry. Does this mean there will be 2 PRs? One (this one) with just changes to JEMALLOC_CONFIG_MALLOC_CONF, and another one updating the jemalloc commit? If that's the case, do you mind updating the PR description, please?

# MADV_DONTNEED. See
# https://github.com/ClickHouse/ClickHouse/issues/11121 for motivation.
set (JEMALLOC_CONFIG_MALLOC_CONF "percpu_arena:percpu,oversize_threshold:0,muzzy_decay_ms:0,dirty_decay_ms:5000,prof:true,prof_active:false,background_thread:true")
set (JEMALLOC_CONFIG_MALLOC_CONF "percpu_arena:percpu,oversize_threshold:0,muzzy_decay_ms:0,dirty_decay_ms:5000,prof:true,prof_active:false,background_thread:true,lg_extent_max_active_fit:8")
Copy link
Copy Markdown
Member

@Algunenano Algunenano May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider leaving a comment explaining why the change to lg_extent_max_active_fit. Also, do we need it for non Linux?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will add a comment to explain the reason. This optimization is commonly used for both Linux and non-Linux systems. However, I have not tested it on non-Linux systems due to the lack of access to such environments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Algunenano Maybe we should consider applying the change to both Linux and Non-Linux system. I will check the CI result.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't test on non-Linux systems, but I'd apply it for consistency

@Algunenano Algunenano self-assigned this May 26, 2025
@jiebinn
Copy link
Copy Markdown
Contributor Author

jiebinn commented May 26, 2025

Member

Hi @Algunenano , we only need to keep this PR to change lg_extent_max_active_fit to fix the high page fault issue. And we don't need to update jemalloc. And I will update the PR description.

This patch helps to set lg_extent_max_active_fit to 8, which
will help jemalloc to reuse existing dirty extents more
efficiently when using two-level hash tables (256 sub hashtables
and reallocations). We have tested this patch with Clickbench Q35
on a system with 2 x 240 vCPUs. The results show significant
performance gains (opt/base):

- QPS: 1.96x
- VmRSS: 54.6%
- Page faults: 29%
- Cycles: 43%
- Instructions: 85.7%
- IPC: 1.99x

The geometric mean of all 43 queries shows more than a 10%
performance improvement.

Refs: jemalloc/jemalloc#2842

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
Reviewed-by: Tianyou Li <tianyou.li@intel.com>
Reviewed-by: Wangyang Guo <wangyang.guo@intel.com>
Reviewed-by: Zhiguo Zhou <zhiguo.zhou@intel.com>
@Algunenano
Copy link
Copy Markdown
Member

@jiebinn I'm trying to reproduce the improvements in the description but I don't see pretty much any change:

Result:
We have tested this patch with Clickbench Q35 on a system with 2 x 240 vCPUs. The results show significant performance gains (opt/base):

<style> </style>

Q35 Query per second VmRSS Total page fault (50 runs) cycles instructions IPC
Opt/Base 1.961 0.546 0.29 0.43 0.857 1.993
The geometric mean of all 43 queries shows more than a 10% performance improvement.

Testing with a AMD Ryzen 9 7950X3D 16-Core Processor (with hyperthreading enabled, so 32 vCPU):

$ clickhouse benchmark --port 49000 --query "SELECT ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3, COUNT(*) AS c FROM hits GROUP BY ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3 ORDER BY c DESC LIMIT 10;" -i 50
  • Master: localhost:49000, queries: 50, QPS: 4.925, RPS: 492470445.132, MiB/s: 1878.626, result RPS: 49.248, result MiB/s: 0.002.
  • Changes: localhost:49000, queries: 50, QPS: 4.951, RPS: 495046115.319, MiB/s: 1888.451, result RPS: 49.506, result MiB/s: 0.002.
$ clickhouse benchmark --port 49000 --query "SELECT ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3, COUNT(*) AS c FROM hits GROUP BY ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3 ORDER BY c DESC LIMIT 10 settings max_threads=64;" -i 50

Master: localhost:49000, queries: 50, QPS: 1.695, RPS: 169451896.802, MiB/s: 646.408, result RPS: 16.946, result MiB/s: 0.001.
Changes: localhost:49000, queries: 50, QPS: 1.554, RPS: 155365551.705, MiB/s: 592.673, result RPS: 15.537, result MiB/s: 0.001.

I'm waiting for the perf report to run again, but if you could explain how/what are you measuring that'd be great. Does this problem only appear when running with large amount of threads (2x240 vCPUs)?

I see this in the build:

$ ack JEMALLOC_CONFIG_MALLOC_CONF
CMakeCache.txt
816:JEMALLOC_CONFIG_MALLOC_CONF_OVERRIDE:STRING=

contrib/jemalloc-cmake/include_linux_x86_64/jemalloc/internal/jemalloc_internal_defs.h
409:#define JEMALLOC_CONFIG_MALLOC_CONF "percpu_arena:percpu,oversize_threshold:0,muzzy_decay_ms:0,dirty_decay_ms:5000,prof:true,prof_active:false,background_thread:true,lg_extent_max_active_fit:8"

So I'm assuming it's applied correctly.

@jiebinn
Copy link
Copy Markdown
Contributor Author

jiebinn commented May 26, 2025

@jiebinn I'm trying to reproduce the improvements in the description but I don't see pretty much any change:

Result:
We have tested this patch with Clickbench Q35 on a system with 2 x 240 vCPUs. The results show significant performance gains (opt/base):

<style> </style>

Q35 Query per second VmRSS Total page fault (50 runs) cycles instructions IPC
Opt/Base 1.961 0.546 0.29 0.43 0.857 1.993
The geometric mean of all 43 queries shows more than a 10% performance improvement.

Testing with a AMD Ryzen 9 7950X3D 16-Core Processor (with hyperthreading enabled, so 32 vCPU):

$ clickhouse benchmark --port 49000 --query "SELECT ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3, COUNT(*) AS c FROM hits GROUP BY ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3 ORDER BY c DESC LIMIT 10;" -i 50
  • Master: localhost:49000, queries: 50, QPS: 4.925, RPS: 492470445.132, MiB/s: 1878.626, result RPS: 49.248, result MiB/s: 0.002.
  • Changes: localhost:49000, queries: 50, QPS: 4.951, RPS: 495046115.319, MiB/s: 1888.451, result RPS: 49.506, result MiB/s: 0.002.
$ clickhouse benchmark --port 49000 --query "SELECT ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3, COUNT(*) AS c FROM hits GROUP BY ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3 ORDER BY c DESC LIMIT 10 settings max_threads=64;" -i 50

Master: localhost:49000, queries: 50, QPS: 1.695, RPS: 169451896.802, MiB/s: 646.408, result RPS: 16.946, result MiB/s: 0.001. Changes: localhost:49000, queries: 50, QPS: 1.554, RPS: 155365551.705, MiB/s: 592.673, result RPS: 15.537, result MiB/s: 0.001.

I'm waiting for the perf report to run again, but if you could explain how/what are you measuring that'd be great. Does this problem only appear when running with large amount of threads (2x240 vCPUs)?

I see this in the build:

$ ack JEMALLOC_CONFIG_MALLOC_CONF
CMakeCache.txt
816:JEMALLOC_CONFIG_MALLOC_CONF_OVERRIDE:STRING=

contrib/jemalloc-cmake/include_linux_x86_64/jemalloc/internal/jemalloc_internal_defs.h
409:#define JEMALLOC_CONFIG_MALLOC_CONF "percpu_arena:percpu,oversize_threshold:0,muzzy_decay_ms:0,dirty_decay_ms:5000,prof:true,prof_active:false,background_thread:true,lg_extent_max_active_fit:8"

So I'm assuming it's applied correctly.

Hi @Algunenano, the PR is not related to the system's core number. If each sub-hashtable in the two-level hashtables falls within the size range of jemalloc-defined large extents (16KB to SC_LARGE_MAXCLASS), the PR would improve performance, especially when the sub-hashtable size exceeds 16KB but isn't excessively large. The closer the size is to 16KB, the greater the performance improvement, up to 2X QPS performance improvement at 16KB. If the allocation size exceeds the upper limit, jemalloc will use a different huge extent code path.
If Q35 with hits is running on a 32 vCPU system, the allocation size of sub-hashtables in the two-level hashtables is about 15X larger than on a 480 vCPU system. If the size exceeds the jemalloc-defined large extent size, or if the size is much larger than the lower limit of 16KB, the performance improvement is not significant. I wonder if more filters would help reduce the table size during the aggregation stage, which might help you reproduce the performance result.
We have also written a micro benchmark to reproduce the same scenario.

@jiebinn
Copy link
Copy Markdown
Contributor Author

jiebinn commented May 26, 2025

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <jemalloc/jemalloc.h>

#define ARENA_COUNT 1
#define ALLOC_COUNT 256
#define INITIAL_SIZE (4 * 1024)
#define RESIZE_SIZE (16 * 1024)
#define ITERATIONS 1000

int main() {
    void **ptrs;
    unsigned *arena_ids;
    clock_t start, end;
    double total_time = 0.0;

    const char *version;
    size_t sz = sizeof(version);
    mallctl("version", &version, &sz, NULL, 0);
    printf("Using jemalloc version: %s\n", version);

    ptrs = malloc(ARENA_COUNT * ALLOC_COUNT * sizeof(void*));
    arena_ids = malloc(ARENA_COUNT * sizeof(unsigned));
    if (!ptrs || !arena_ids) {
        fprintf(stderr, "Failed to allocate pointer arrays\n");
        return 1;
    }

    for (unsigned i = 0; i < ARENA_COUNT; i++) {
        unsigned arena_id;
        size_t sz = sizeof(unsigned);
        if (mallctl("arenas.create", &arena_id, &sz, NULL, 0) != 0) {
            fprintf(stderr, "Failed to create arena %u\n", i);
            return 1;
        }
        arena_ids[i] = arena_id;
    }

    for (int iter = 0; iter < ITERATIONS; iter++) {
        //printf("Starting iteration %d...\n", iter+1);
        start = clock();

        for (unsigned i = 0; i < ARENA_COUNT; i++) {
            for (unsigned j = 0; j < ALLOC_COUNT; j++) {
                unsigned idx = i * ALLOC_COUNT + j;
                int flags = MALLOCX_ARENA(arena_ids[i]);
                ptrs[idx] = mallocx(INITIAL_SIZE, flags);
                if (!ptrs[idx]) {
                    fprintf(stderr, "Memory allocation failed\n");
                    return 1;
                }
                memset(ptrs[idx], 1, INITIAL_SIZE);
            }
        }

        for (unsigned i = 0; i < ARENA_COUNT; i++) {
            for (unsigned j = 0; j < ALLOC_COUNT; j++) {
                unsigned idx = i * ALLOC_COUNT + j;
                int flags = MALLOCX_ARENA(arena_ids[i]);
                void *new_ptr = rallocx(ptrs[idx], RESIZE_SIZE, flags);
                if (!new_ptr) {
                    fprintf(stderr, "Memory resize failed\n");
                    return 1;
                }
                ptrs[idx] = new_ptr;
                memset((char*)ptrs[idx] + INITIAL_SIZE, 2, RESIZE_SIZE - INITIAL_SIZE);
            }
        }

        for (unsigned i = 0; i < ARENA_COUNT; i++) {
            for (unsigned j = 0; j < ALLOC_COUNT; j++) {
                unsigned idx = i * ALLOC_COUNT + j;
                dallocx(ptrs[idx], 0);
            }
        }

        end = clock();
        double time_taken = ((double)(end - start)) / CLOCKS_PER_SEC;
        total_time += time_taken;
	//printf("Iteration %d took %.6f seconds\n", iter+1, time_taken);
    }

    printf("Total time: %.6f seconds. Average time per loop: %.6f seconds\n", total_time, total_time / ITERATIONS);

    free(ptrs);
    free(arena_ids);

    return 0;
}

@Algunenano
Copy link
Copy Markdown
Member

No noticeable changes in performance tests; I only see it in the microbenchmarks. Still it seems ok to include and analyze in larger perf tests as part of the release

@Algunenano Algunenano enabled auto-merge May 27, 2025 09:34
@Algunenano Algunenano added this pull request to the merge queue May 27, 2025
Merged via the queue into ClickHouse:master with commit 9fb0fb1 May 27, 2025
116 of 122 checks passed
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label May 27, 2025
BiteTheDDDDt pushed a commit to apache/doris that referenced this pull request Oct 20, 2025
…s to change jemalloc conf (#57076)

To reduce the page fault like clickhouse do the work

Related PR:
#clickhouse](ClickHouse/ClickHouse#80245)
github-actions bot pushed a commit to apache/doris that referenced this pull request Oct 20, 2025
…s to change jemalloc conf (#57076)

To reduce the page fault like clickhouse do the work

Related PR:
#clickhouse](ClickHouse/ClickHouse#80245)
github-actions bot pushed a commit to apache/doris that referenced this pull request Oct 20, 2025
…s to change jemalloc conf (#57076)

To reduce the page fault like clickhouse do the work

Related PR:
#clickhouse](ClickHouse/ClickHouse#80245)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants