Skip to content

Add PerfTools::DumpHeap#30

Open
HertzDevil wants to merge 5 commits intomainfrom
feature/dump-heap
Open

Add PerfTools::DumpHeap#30
HertzDevil wants to merge 5 commits intomainfrom
feature/dump-heap

Conversation

@HertzDevil
Copy link
Collaborator

Resolves #5.

This is a very primitive binary dump and, as mentioned in the issue, requires compiler support to emit the appropriate type info at build time so that other tools can comprehend those dumps (CRYSTAL_DUMP_TYPE_ID=1 will let you visually identify some allocations in a hex editor at the moment).

Apart from these custom formats, maybe we could try to produce industry or community standard memory dumps in the future...?

@HertzDevil HertzDevil added the enhancement New feature or request label Jun 29, 2025
Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pretty nice! I've been pondering about this for a while.

I see just a couple issues:

  1. should stop the world while we dump the heap so it's MT compatible (that one's easy).

  2. We can't allocate anything, but stdlib will happily allocate anywhere... Maybe we could introduce Crystal::System.read(fd, slice) and .write(fd, slice) methods that would directly read from and write to a system fd or handle?

@HertzDevil
Copy link
Collaborator Author

2. We can't allocate anything, but stdlib will happily allocate anywhere

Do you mean that GC.lock_write + sync = true + stop-the-world are still insufficient because there could still be fiber switches while dumping?

Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, it looks great to me for the initial step 🙇

@HertzDevil
Copy link
Collaborator Author

It is possible to eliminate GC allocations from the IO itself like this: (Win32)

class RawIO < IO
  def initialize(@handle : LibC::HANDLE)
  end

  def read(slice : Bytes)
    LibC.WriteFile(@handle, slice, slice.size, out read_count, nil)
    read_count.to_i32
  end

  def write(slice : Bytes) : Nil
    LibC.WriteFile(@handle, slice, slice.size, nil, nil)
  end
end

macro open_raw_file(path, mode = "r", &block)
  {% write = mode == "w" ? true : mode == "r" ? false : mode.raise "Unknown file mode: #{mode.id}" %}

  # if `path.to_utf16` uses a fixed-size stack buffer then
  # this macro could even be turned into a regular method
  %handle = ::LibC.CreateFileW(
    {{ path.to_utf16 }},
    {% if write %} ::LibC::FILE_GENERIC_WRITE {% else %} ::LibC::FILE_GENERIC_READ {% end %},
    ::LibC::DEFAULT_SHARE_MODE,
    nil,
    {% if write %} ::LibC::CREATE_ALWAYS {% else %} ::LibC::OPEN_EXISTING {% end %},
    ::LibC::FILE_FLAG_BACKUP_SEMANTICS,
    nil,
  )

  if %handle == ::LibC::INVALID_HANDLE_VALUE
    ::raise(::RuntimeError.new("CreateFileW"))
  end

  begin
    %raw_io_buf = uninitialized ::ReferenceStorage(::RawIO)
    {{ block.args[0] }} = ::RawIO.unsafe_construct(pointerof(%raw_io_buf), %handle)
    {{ block.body }}
  ensure
    ::LibC.CloseHandle(%handle)
  end
end

open_raw_file("heap.bin", "w") do |f|
  PerfTools::DumpHeap.full(f)
end

straight-shoota pushed a commit to crystal-lang/crystal that referenced this pull request Jul 28, 2025
If the environment variable `CRYSTAL_DUMP_TYPE_INFO` is set, at build time the compiler will emit a bunch of type information to a JSON file at that path. The JSON looks something like:

```json
{
    "types": [
        {
            "name": "Regex",
            "id": 46,
            "min_subtype_id": 46,
            "supertype_id": 188,
            "has_inner_pointers": true,
            "size": 8,
            "align": 8,
            "instance_size": 56,
            "instance_align": 8,
            "instance_vars": [
                {
                    "name": "@re",
                    "type_name": "Pointer(LibPCRE2::Code)",
                    "offset": 8,
                    "size": 8
                },
                {
                    "name": "@jit",
                    "type_name": "Bool",
                    "offset": 16,
                    "size": 1
                },
                {
                    "name": "@source",
                    "type_name": "String",
                    "offset": 24,
                    "size": 8
                },
                {
                    "name": "@match_data",
                    "type_name": "Crystal::ThreadLocalValue(Pointer(LibPCRE2::MatchData))",
                    "offset": 32,
                    "size": 16
                },
                {
                    "name": "@options",
                    "type_name": "Regex::Options",
                    "offset": 48,
                    "size": 8
                }
            ]
        }
    ]
}
```

At the moment, this is intended to be an internal tool that supplements the similarly named `CRYSTAL_DUMP_TYPE_ID` environment variable. I originally made this to generate human-readable reports from [GC heap dumps](crystal-lang/perf-tools#30), but there are probably other good uses like enhancing the debugger support scripts.
Comment on lines +25 to +26
Thread.stop_world
GC.lock_write
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the order can deadlock: a stopped thread might hold the read or write lock, and we won't be able to lock.

We probably don't need to lock the GC anyway: the current thread stopped the world, and we own the process.

Suggested change
Thread.stop_world
GC.lock_write
Thread.stop_world

fn.call(obj, bytes.to_u64!)
end, pointerof(block))
ensure
GC.unlock_write
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GC.unlock_write

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Mar 5, 2026

The RawIO is interesting, but it only works for file descriptors, and it still involves the event loop, and thus the fiber scheduler, which can raise, etc.

We only need to write Bytes and UInt64 and if we limit to file descriptors, then we could directly write to it (only need to be careful with O_NONBLOCK on UNIX) and we already handle that in crystal/tracing and crystal/system/print_error, so we could consolidate implementations 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dumping the entire dynamic heap

3 participants