Skip to content

Fix memory leak in tracetools::get_symbol()#43

Merged
christophebedard merged 1 commit intorollingfrom
christophebedard/fix-memory-leak-in-tracetools-get-symbol
Feb 28, 2023
Merged

Fix memory leak in tracetools::get_symbol()#43
christophebedard merged 1 commit intorollingfrom
christophebedard/fix-memory-leak-in-tracetools-get-symbol

Conversation

@christophebedard
Copy link
Copy Markdown
Member

@christophebedard christophebedard commented Feb 14, 2023

Fixes #34

tracetools::get_symbol() is used to get a demangled symbol from a function pointer or std::function object. This is used to associate a human-readable string to a callback in the trace data. To do symbol demangling, it calls abi::__cxa_demangle(), which returns a pointer that must be freed by the caller. This was never freed. The best solution would be to use a limited size buffer on the stack and never allocate. However, while abi::__cxa_demangle can accept a pre-allocated buffer, it cannot be on the stack, since abi::__cxa_demangle expands it using realloc if it is not long enough.

Since the pointer to the demangled symbol must be freed, the problem is that tracetools::get_symbol() was returning a pointer to a literal when something failed (e.g., demangling, or mapping from function pointer to shared object), so we couldn't always free() the return value. There's no real use to returning anything other than nullptr when something fails.

Changes:

  • Change tracetools::get_symbol() to return char * instead of const char *, and to return nullptr when something fails
    • Do the same for the underlying/internal functions.
    • Document that the return value should be freed by the caller.
  • Change get_symbol_funcptr(void * funcptr) to get_symbol_funcptr(const void * funcptr), since that's more appropriate
  • Remove the #ifdef TRACETOOLS_LTTNG_ENABLED conditions in tracetools::detail::{demangle_symbol,get_symbol_funcptr}, since there's no real need to do anything different if TRACETOOLS_LTTNG_ENABLED is defined. These functions shouldn't be called in that case anyway.
  • Update corresponding test (test_tracetools's test_utils)

This PR breaks ABI and requires the following downstream changes:

Downstream PR:

Signed-off-by: Christophe Bedard christophe.bedard@apex.ai

@christophebedard christophebedard self-assigned this Feb 14, 2023
@christophebedard christophebedard marked this pull request as draft February 14, 2023 01:44
@christophebedard
Copy link
Copy Markdown
Member Author

christophebedard commented Feb 14, 2023

I'm marking this as a draft because I would like to get feedback before updating downstream packages.

This PR requires the following change in rclcpp, everywhere TRACEPOINT(rclcpp_callback_register, ...) is called:

+ char * symbol = tracetools::get_symbol(callback_);
  TRACEPOINT(
    rclcpp_callback_register,
    static_cast<const void *>(&callback_),
-   tracetools::get_symbol(callback_));
+   symbol);
+ std::free(symbol);

Of course, we can surround this with #ifndef TRACETOOLS_DISABLED/#endif to avoid calling tracetools::get_symbol() and allocating memory if the instrumentation is completely removed. However, if the instrumentation is included, then, even if the tracepoint is disabled (i.e., even if the data isn't actually collected by the tracer), memory will be allocated.

To solve this, we can replace LTTng's tracepoint(...) with:

if (tracepoint_enabled(name)) {
  // Compute arguments here
  auto args = ...;
  // Then trigger tracepoint manually
  do_tracepoint(name, args);
}

See lttng-ust(3). Then we can first check if the tracepoint is enabled before calling tracetools::get_symbol()/allocating, triggering the tracepoint manually, and freeing the memory. However, this would mean having to create the corresponding macros to go alongside the current TRACEPOINT(...) macro, i.e., TRACEPOINT_ENABLED(...) and DO_TRACEPOINT(...). This complicates tracetools a bit, but it allows for more flexibility for things like this.

@christophebedard
Copy link
Copy Markdown
Member Author

cc @iluetkeb @mjcarroll

@iluetkeb
Copy link
Copy Markdown
Contributor

Good catch!

The current approach basically passes on the obligation to free the memory to the caller of get_symbol. This is the most straightforward approach, but I'm wondering whether we can come up with something to avoid it. You have thought about the pre-allocated buffer (but we would also have to free that somehow).

btw, one of the other thoughts I had is whether we could output metadata throughout execution at regular intervals. This would then require us to store the metadata anyhow. I'm not saying we need that, but it's maybe something to consider in conjunction.

@christophebedard
Copy link
Copy Markdown
Member Author

christophebedard commented Feb 14, 2023

The current approach basically passes on the obligation to free the memory to the caller of get_symbol. This is the most straightforward approach, but I'm wondering whether we can come up with something to avoid it. You have thought about the pre-allocated buffer (but we would also have to free that somehow).

demangle_symbol() could have its own buffer (on the stack), and:

  1. Call abi::__cxa_demangle, let it allocate
  2. Copy the result into its buffer, up to some max length
    • I could come up with a reasonable value; in any case this isn't really critical
  3. Free the string returned by abi::__cxa_demangle
  4. Return a const char * to the buffer
    • So tracetools::get_symbol() and friends would return a const char *

This wouldn't really work if you called tracetools::get_symbol() more than once before the tracer can read the first string, but we might not need to call it consecutively anyway, and we could document this. This makes it easier for the caller, but this might be too weird of a behaviour. As you said, the simpler approach is to let the caller free the memory.

@christophebedard
Copy link
Copy Markdown
Member Author

christophebedard commented Feb 14, 2023

btw, one of the other thoughts I had is whether we could output metadata throughout execution at regular intervals. This would then require us to store the metadata anyhow. I'm not saying we need that, but it's maybe something to consider in conjunction.

Yeah, we should keep these things separate for now. I opened an issue for this an mentioned that as a potential solution: #44.

@christophebedard christophebedard force-pushed the christophebedard/fix-memory-leak-in-tracetools-get-symbol branch from 72fd8b9 to 17c4bde Compare February 14, 2023 19:44
@christophebedard
Copy link
Copy Markdown
Member Author

However, this would mean having to create the corresponding macros to go alongside the current TRACEPOINT(...) macro, i.e., TRACEPOINT_ENABLED(...) and DO_TRACEPOINT(...). This complicates tracetools a bit, but it allows for more flexibility for things like this.

Note that this improvement in tracetools (and downstream changes in rclcpp) could be made in separate PRs.

@christophebedard christophebedard force-pushed the christophebedard/fix-memory-leak-in-tracetools-get-symbol branch from 17c4bde to de1e1c7 Compare February 22, 2023 18:27
@christophebedard christophebedard marked this pull request as ready for review February 22, 2023 18:27
Signed-off-by: Christophe Bedard <christophe.bedard@apex.ai>
@christophebedard christophebedard force-pushed the christophebedard/fix-memory-leak-in-tracetools-get-symbol branch from de1e1c7 to 51b2d85 Compare February 23, 2023 03:54
@christophebedard
Copy link
Copy Markdown
Member Author

christophebedard commented Feb 23, 2023

Testing --packages-above tracetools test_tracetools rclcpp:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

Includes:

@christophebedard
Copy link
Copy Markdown
Member Author

@mjcarroll friendly ping

@christophebedard christophebedard merged commit faf6c27 into rolling Feb 28, 2023
@christophebedard christophebedard deleted the christophebedard/fix-memory-leak-in-tracetools-get-symbol branch February 28, 2023 22:39
@cottsay
Copy link
Copy Markdown
Member

cottsay commented Mar 1, 2023

I think this caused a fair number of regressions. This repository's CI shows breakage, and my builds on RHEL 8 are now failing as well.

Stuff like:

libtracetools.so: undefined reference to `dladdr'
collect2: error: ld returned 1 exit status

@Crola1702 FYI

@christophebedard
Copy link
Copy Markdown
Member Author

christophebedard commented Mar 1, 2023

I think this caused a fair number of regressions. This repository's CI shows breakage, and my builds on RHEL 8 are now failing as well.

Stuff like:


libtracetools.so: undefined reference to `dladdr'

collect2: error: ld returned 1 exit status

Are there any non-RHEL failures?

Note that this broke ABI, so some Rpr failures are expected. But not the one you included above; that one does make sense to me, especially given that we didn't test this change on RHEL.

Also, don't mind the local GitHub actions. They're broken.

@christophebedard
Copy link
Copy Markdown
Member Author

christophebedard commented Mar 1, 2023

Looking at CI, it seems to be RHEL-only (at least regarding this PR). #48 fixes that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory leak with tracetools::get_symbol()

4 participants