Releases: NVlabs/NVBit
Releases · NVlabs/NVBit
NVBit-1.7.7.3
NVBit-1.7.7.2
Fixed
- Removed the incorrect assertion on how many mref address operands can be present in an instruction.
InstrType::MAX_NUM_MREF_PER_INSTRis removed too.
NVBit-1.7.7.1
Hotfix for CUDA_ERROR_INVALID_SOURCE when using channel.hpp
Changed
- Removed device side
assert()from channel.hpp. If your tool run in CUDA 13.1+, you can add it back. You can also use printf() in inject_func.cu if you are running your tool in CUDA 13.1. Otherwise, you will see CUDA_ERROR_INVALID_SOURCE.
NVBit-1.7.7
Changed
- Updated CUDA headers to CUDA 13.1.
- Removed kernel execution serialization in mem_trace tool and stopped using ASYNC_COPY_STREAM in channel.hpp by default. mem_trace tool now uses new NVBit APIs to load and launch CUDA functions used in the tool.
Added
- Added tmem address parsing.
- Added printf and assert support back (require CUDA 13.1 and newer toolkit and driver).
- Added nvbit_load_tool_module() for loading a module that contains CUDA functions used by a tool (e.g., flush_channel() used in mem_trace.so was loaded implicitly). This avoids potential tool deadlocks.
- Added nvbit_find_function_by_name() for getting CUfunction from a loaded tool module.
- Added nvbit_launch_kernel() for launching a CUfunction.
Fixed
- Fixed hangs in mem_trace if a context does not launch any kernel.
NVBit-1.7.6
Changed
- Updated CUDA headers to CUDA 13.0
Added
- Added SM_110 support
- Added nvbit_dump_cubin() for tools to inspect a function's cubin file (Note: on Hopper and newer GPUs, line info can still be retrieved by disassembling the dumped cubin with nvdisasm when nvbit_get_line_info() does not work).
Fixed
- Fixed a bug related to warpsync.collective instruction.
- Fixed an issue causing nvbit_get_line_info() fails (Note: direct support for Hopper and newer GPUs is not available; use manual disassembly of cubins instead).
NVBit-1.7.5
Announcement
We are working to enhance NVBit development and gain insights into its user base to better estimate the additional resources needed. Please take a moment to fill out this survey: https://forms.cloud.microsoft/r/zd1Kx3g8iQ and share it with any NVBit users you know. Your input is greatly appreciated—thank you!
Changelog
Fixed
- Fixed
CALL.REL.NOINChandling (#142) - Fixed a patch function argument passing issue.
- Fixed race condition for multithreaded CUDA program. NVBit serializes all kernel launches.
- Stop CUDA event callback for any CUDA APIs used inside NVBit tools.
- Fixed nvbit_tool_init() so that it is called once for each context.
- Fixed NVBit to present the same code as nvdisasm (#149)
- Fixed SASS string decoding issue (#148)
Changed
- Used a new way of getting related functions.
- Updated CUDA headers to CUDA 12.9
NVBit-1.7.4
Added
- Added SM_120 support
Changed
- nvbit_get_kernel_argument_sizes(), nvbit_get_func_addr(), and nvbit_get_func_config() now require CUcontext as an input.
Fixed
- Fixed the issue which prevents per context tool initialization (#140).
NVBit-1.7.3
- Fixed the multi-context issue in #137
- Fixed a related function discovery crash.
- Fixed nvbit_read/write_(u)reg() functions, which might not work in certain conditions.
- Updated read_write_regs and record_reg_vals tools to avoid deadlocks.
NVBit-1.7.2
- [API change]
nvbit_set_at_launch(CUcontext ctx, CUfunction func, uint64_t param_val, CUstream custream = nullptr,uint64_t launch_handle = 0)now accepts parameter value instead of a pointer to the parameter. The newly added custream and launch_handle are provided and used during nvbit_at_graph_node_launch() to help set the parameter for CUDA graph kernel node. - Improved cubin compatibility
- Fixed SASS instruction parsing
- Improved CUDA graph support
- [experimental] Changed mem_trace to support CUDA graph.
- Fixed related function detection for the function pointer case.
NVBit-1.7.1
- Improved CUDA program compatibility
- Fixed related function discovery on SM80 (close #129).
- Updated license headers.