Skip to content

Conversation

@dalehamel
Copy link
Contributor

@dalehamel dalehamel commented Oct 4, 2025

Fix for #399

  • Fixed cfunc CME traversal
    • both check_method_entry and the logic calling it were previously completely broken, and could have caused infinite loops if we weren't only using them to check cfunc frames which are (i think always?) local, so this wasn't noticed.
  • Update coredump tests
    • Fixed 3.3.0 coredump, which was truncated and missing the memory region with the strings for the root frame causing the root frame (<main>) to be elided
  • Created a more pathological case for different cases of frame names, using https://github.com/ivoanjo/backtracie/blob/main/spec/unit/interesting_backtrace_helper.rb
    • Note that it does not exactly match the output of stackprof yet, for more complex cases like evals. But for the majority of the frames it does match. We can work towards improving this, but as best as I can tell i'm already duplicating the rb_profile_frames and rb_profile_frame_full_label logic that stackprof uses.
  • Added table tests for qualified name and full label calculation
  • Fixed coredump action to avoid truncated coredumps

Depends on rbspy/rbspy-testdata#8 which updates the coredumps

I'll squash and tidy up some commits once this has been reviewed.

byteorder = "1.4.3"
rbspy-testdata = "0.2.4"
rbspy-testdata = { git = "https://github.com/dalehamel/rbspy-testdata.git", branch = "update-coredumps" }
rstest = "0.24"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to facilitate table testing of the label calculations

@@ -0,0 +1,207 @@
# frozen_string_literal: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used with permission from @ivoanjo #399 (comment)

@@ -0,0 +1,26 @@
require_relative './interesting_backtrace_helper'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some preamble to test different cases, then load the test file from backtracie

- name: Create core dump
shell: sudo bash {0}
run: |
echo 0x7f > /proc/${{ steps.ruby-script.outputs.pid }}/coredump_filter
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that 0xff also seems to work, and it might also be a good idea to do gcore -a

It drove me crazy trying to figure out why my string reads would work on a local process, but fail in the coredump - the memory region the string was in was not in the coredump lol.

Github uses 0x33 by default for coredump filter.

let cfps = get_cfps(thread.cfp as usize, stack_base(&thread) as usize, source)?;
for cfp in cfps.iter() {
if cfp.iseq as usize == 0 {
let cfunc = ruby_frame_cfunc(&cfp, source)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I flipped the conditional around here and got rid of the frame flag filtering to match what ruby's own backtrace function is doing.

// I don't know what the -1 is about. Also note that the stack_size is *not* in bytes! stack is a
// VALUE*, and so stack_size is in units of sizeof(VALUE).
// We must skip two dummy frames:
// https://github.com/ruby/ruby/blob/5445e0435260b449decf2ac16f9d09bae3cafe72/vm_backtrace.c#L477-L485%C2%A
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that i think the other -1 is occurring implicitly in our calculation of how many CFPs to collect, hence why this is -1 and not -2 here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any particular feedback on this, but It's great to see this "I don't know what the -1 is about" comment finally being removed and replaced with an explanation :)


macro_rules! get_classpath(
() => (
fn rb_class_real<T>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See variable.c for this and the other helper functions.

//if iseq_struct.body == std::ptr::null_mut() {
// return Err(format_err!("iseq body is null"));
//}
let cme = locate_method_entry(&cfp.ep, source)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we mimic vm_backtrace in checking if the CFP contains a CME, and if so, prefer that.

The iseq body passed in is only used if we do NOT have a CME containing imemo_ment.

rstring.basic.klass as usize,
source
).context("couldn't get ruby string from iseq body")?;
).context(format!("couldn't get ruby path string array from iseq body: {:X}", frame_flag)).unwrap_or(("FAILED".to_string(), "FAILED".to_string()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the unwrap_or, we may frequently drop frames if the paths are invalid. I think it is better to indicate the failure instead.

}

fn get_cfunc_name<T: ProcessMemory>(
fn ruby_frame_cfunc<T>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if a frame is a cfunc via the flags

Ok((frame_flag & VM_FRAME_MAGIC_MASK) == VM_FRAME_MAGIC_CFUNC)
}

pub fn qualified_method_name(class_path: &str, method_name: &str, singleton: bool) -> String {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a better place for this function, for now we may as well put it here since cfuncs can use it too.

return Ok(me);
}
unsafe {
ep = ep.offset(0) as *mut usize;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking the method entry was previously completely broken. This would never iterate, and it could have actually gotten stuck in an infinite loop, but we were only using it on local method entries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we've been lucky with this code. Much of this was based on an old-even-then gdb script and involved a lot of feeling around in the dark until it did something useful. And even after it was working, a lot of it was (and still is) mysterious and inscrutable to anybody who's not familiar with cruby internals. Thanks for taking the time to fix and expand it 👍

if env_flags & VM_ENV_FLAG_LOCAL == 0 {
//println!("ALREADY LOCAL {:X}", env_me_cref);
}
while env_flags & VM_ENV_FLAG_LOCAL != 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the conditional should have been checking the flags, not the specval.

Ok(me)
}

fn check_method_entry<T: ProcessMemory>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was also pretty broken. I rewrote it to mimic the logic in ruby's vm_insnhelper.c

vec![
StackFrame {
name: "sleep [c function]".to_string(),
name: "Kernel#sleep [c function]".to_string(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my changes we can get the cfunc class name in 3.3.0+

absolute_path: Some("unknown".to_string()),
lineno: Some(192),
},
StackFrame {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coredump was invalid and this frame was dropped entirely. I redid the coredump and added the missing frame.

]
}

fn real_complex_trace_with_classes_3_4_5() -> Vec<StackFrame> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A backtrace using the helper inspired by Ivo's script.

There are a couple of cases where the frames don't match stackprof however, and I cannot for the life of me figure out why.

In particular, the "" frames have a frame flag of 0x7777... which is the "eval" type. There is no CME, the EP is local, and i don't see how we can possible get a class from them. I wonder if this is an issue with how we are reading the structs in rbspy or something, as stackprof should basically be running the same algorithm now :/

But it handles a bunch of other stuff correctly, such as anonymous modules, singletons, etc. So all in all it is worth it to keep for now, we can hopefully improve the fidelity even more later.

#[test]
fn test_get_ruby_stack_trace_3_3_1() {
let source = coredump_3_3_0();
let vm_addr = 0x7f7ff21f1868;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These and others needed to be updated since I replaced the coredump with a non-truncated on.

assert_eq!(real_stack_trace_3_3_0(), stack_trace.unwrap().trace);
}

#[rstest]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added rstest here for table tests on these label calculation functions.

The logic of the functions themselves should match their counterparts in cruby.

In case we find cases where they don't, we can fix it and add regression tests here based on the input combos.

There is no need to update the test, just add new cases to the macro with the inputs to use and expected output.

let imemo: rb_method_entry_struct = source.copy_struct(cme).context(cme)?;
let method_type = source.copy(imemo.def as usize, 1).context(imemo.def as usize)?;
// https://github.com/ruby/ruby/blob/v3_3_0/internal/imemo.h#L21
let method_type = method_type[0] & 0xf;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note 0xf is because it is a bitfield of 4 bits, so we read 1 byte and then mask out the upper bits.

@dalehamel
Copy link
Contributor Author

Freebsd failure seems to be because of bumping the rbspy testdata crate, not sure why that is.

@dalehamel dalehamel marked this pull request as ready for review October 8, 2025 14:49
@dalehamel
Copy link
Contributor Author

Please take a look when you get a chance @acj

@dalehamel
Copy link
Contributor Author

Returned rbspy-testdata to versioned crate, and rebased after #402 merged, removed references to removed get_ruby_string_limit

Copy link
Member

@acj acj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Really appreciate the effort to fix and clean up the various helpers, and also the detailed comments along the way. Go ahead and clean up the commits when you have time, and I'll do some local testing to hopefully shake loose any edge cases. Then I'm good with merging.


module ModuleMethod
def call_instance()
InstanceMethod.new.work()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 70c01cf

end

loop do
work_main()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too

return Ok(me);
}
unsafe {
ep = ep.offset(0) as *mut usize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we've been lucky with this code. Much of this was based on an old-even-then gdb script and involved a lot of feeling around in the dark until it did something useful. And even after it was working, a lot of it was (and still is) mysterious and inscrutable to anybody who's not familiar with cruby internals. Thanks for taking the time to fix and expand it 👍

@dalehamel
Copy link
Contributor Author

This looks great. Really appreciate the effort to fix and clean up the various helpers, and also the detailed comments along the way. Go ahead and clean up the commits when you have time, and I'll do some local testing to hopefully shake loose any edge cases. Then I'm good with merging.

Hey @acj hope you are well, i'm back from vacation and i've addressed your style nits and squashed a few commits. LMK if you found anything that needs to be addressed.


const RUBY_QNIL: usize = 0x04;

//let mut ep = ep.clone() as *mut usize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I missed this line in the earlier review. I'll clean it up after merging - just making a note here so that I remember.

@acj acj merged commit 491d61b into rbspy:main Oct 22, 2025
9 checks passed
@acj
Copy link
Member

acj commented Oct 22, 2025

Thanks! Released in 0.38.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants