Skip to content

Add instrumentation using XRay#77153

Closed
htuall wants to merge 121 commits intoClickHouse:masterfrom
htuall:xray_experiments
Closed

Add instrumentation using XRay#77153
htuall wants to merge 121 commits intoClickHouse:masterfrom
htuall:xray_experiments

Conversation

@htuall
Copy link
Copy Markdown
Contributor

@htuall htuall commented Mar 5, 2025

Changelog category (leave one):

  • Experimental Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Add instrumentation using XRay to parse xray_instr_map section generated by xray and map entry addresses to function names. Resolves #74249

Details

Created parse_elf.cpp designated to parse xray_instr_map section generated by xray to then map entry addresses to function names. Currently not finished, needs further work, submitting as a checkpoint to track progress. Current problem: function address that I get from xray_instr_map does not match to the expected address of the function, which I also get from elf-binary. Possible reasons: 1) incorrect parsing of the section; 2) some relocations in elf-binary.

Resolves #74249

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 5, 2025

CLA assistant check
All committers have signed the CLA.

@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label Mar 5, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 5, 2025

Workflow [PR], commit [85a2f29]

@clickhouse-gh clickhouse-gh bot added the pr-experimental Experimental Feature label Mar 5, 2025
@pamarcos pamarcos self-assigned this Mar 14, 2025
@pamarcos
Copy link
Copy Markdown
Member

Glad you're already working on this @htuall 🎉! Please next time ping me directly or even assign the PR to me so that I'm aware. Somehow this went off under my radar till now 👀

Have you checked how the llvm-xray tool is doing this same thing? I assumed we could get the mapping the same way and our part would be to find the proper function addresses based on non-mangled function names, which is what I think readSymbolTable is doing, right? We'll probably need to demangle those, or alternatively mangle the input symbols to be able to match them. The latter seems less work, since you only need to do that for those symbols we'll be interested in, rather than converting all symbols.

Also, I'm afraid I can't yet understand Russian comments that well even after @nikitamikhaylov's efforts to teach me the basics. So, please use English whenever possible so that I don't have to be going back and forth translating 🙏

htuall added 2 commits April 9, 2025 21:18
…h custom handler for xray + started working on adding SYSTEM-statements to add/remove instrumentation(wip)
@htuall htuall force-pushed the xray_experiments branch from 607ef5b to 085f7ae Compare April 9, 2025 21:41
@pamarcos pamarcos self-requested a review April 10, 2025 10:04
Copy link
Copy Markdown
Member

@pamarcos pamarcos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going in the right direction :)
I merged master against your branch and also removed the unnecessary binary artifacts that I think you added by mistake.

Please, make sure that ClickHouse is compiling correctly. Also some test even if very basic would be nice. Even just to ensure that the new system statement works.

…atements using it + attempted to add system.instrumentation table(not sure if it's coorrect
@htuall htuall force-pushed the xray_experiments branch from 642f177 to 175386a Compare April 20, 2025 14:26
htuall and others added 9 commits April 20, 2025 16:16
Let's build with ENABLE_XRAY=1 only for the Debug build on AMD
for now. This is just to ensure on the CI that it builds
correctly.

There are some USE_XRAY missing in other files, but it is left
on purpose as an excercise to the reader :)
@clickhouse-gh clickhouse-gh bot added manual approve Manual approve required to run CI submodule changed At least one submodule changed in this PR. labels Apr 21, 2025
@pamarcos pamarcos removed the manual approve Manual approve required to run CI label Apr 22, 2025
@clickhouse-gh clickhouse-gh bot added the manual approve Manual approve required to run CI label Apr 22, 2025
@pamarcos pamarcos added the pr-performance Pull request with some performance improvements label Oct 15, 2025
@pamarcos pamarcos removed the manual approve Manual approve required to run CI label Oct 15, 2025
@clickhouse-gh clickhouse-gh bot removed the pr-performance Pull request with some performance improvements label Oct 15, 2025
@clickhouse-gh clickhouse-gh bot added the manual approve Manual approve required to run CI label Oct 17, 2025
@pamarcos
Copy link
Copy Markdown
Member

Creating a Chrome trace format from systen.xray_instrumentation_profiling_log:

Ask to profile and sleep a bit for every query entry to have some data:

SYSTEM INSTRUMENT ADD `QueryMetricLog::startQuery` PROFILE
SYSTEM INSTRUMENT ADD `QueryMetricLog::startQuery` SLEEP ENTRY 0.5

Send 1000 parallel queries using 24 parallel jobs:

yes "clickhouse client --query 'select 1'" | head -n 1000 | parallel -j24

The chrome_trace.sql query:

SELECT
    format(
        '{{"traceEvents": [{}]}}',
        arrayStringConcat(
            groupArray(
                concat(
                    -- Begin Event
                    format(
                        '\n{{"name": "{}", "cat": "{}", "ph": "B", "ts": {}, "pid": 1, "tid": {}, "args": {{"query_id": "{}"}}}}',
                        name,
                        'clickhouse',
                        toString(event_time_microseconds),
                        toString(tid),
                        query_id
                    ),
                    ',',
                    -- End Event
                    format(
                        '\n{{"name": "{}", "cat": "{}", "ph": "E", "ts": {}, "pid": 1, "tid": {}, "args": {{"query_id": "{}"}}}}',
                        name,
                        'clickhouse',
                        toString(event_time_microseconds + duration_microseconds),
                        toString(tid),
                        query_id
                    )
                )
            ),
            ','
        )
    ) AS trace_json
FROM system.xray_instrumentation_profiling_log
WHERE event_date >= today();
clickhouse client --query "$(cat chrome_trace.sql)" > trace.json

trace.json

We can see in Perfetto

image

@pamarcos
Copy link
Copy Markdown
Member

pamarcos commented Oct 20, 2025

Comparing the binary size:

Random PR:

This PR:

For x86_64, there's a ~4.5% of binary size increase. For aarch64, there's a ~7% binary size increase.

@pamarcos
Copy link
Copy Markdown
Member

It seems we cannot use XRay and any other sanitizer at the same time as per https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=77153&sha=23c6ed27502189ca2df435a42c653cb5ed18896c&name_0=PR

Even though I haven't found on the internet any evidence of them being incompatible, both sanitizers and XRay seem to use the same underlying allocator and runtime infrastructure from sanitizer_common, which is why you get "multiple definition" errors for symbols like __sanitizer::internal_allocator().

@pamarcos pamarcos mentioned this pull request Oct 29, 2025
1 task
@pamarcos
Copy link
Copy Markdown
Member

I'm closing this PR in favor of a new one where I'm the owner and I can freely change the description, changelog and everything without impersonating @htuall. Having to approve the workflow for every commit was also bad.

I'm keeping all the commits from Alina and will provide credit to her 💪

#89173

@pamarcos pamarcos closed this Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors manual approve Manual approve required to run CI pr-experimental Experimental Feature submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Instrumentation with LLVM XRay

4 participants