Skip to content

Releases: NVlabs/NVBit

NVBit-1.7.7.3

26 Feb 17:40
73092be

Choose a tag to compare

Fixed

  • Removed the remaining incorrect assertion on how many mref address operands can be present in an instruction.

NVBit-1.7.7.2

25 Feb 19:52
73092be

Choose a tag to compare

Fixed

  • Removed the incorrect assertion on how many mref address operands can be present in an instruction. InstrType::MAX_NUM_MREF_PER_INSTR is removed too.

NVBit-1.7.7.1

06 Feb 17:03
73092be

Choose a tag to compare

Hotfix for CUDA_ERROR_INVALID_SOURCE when using channel.hpp

Changed

  • Removed device side assert() from channel.hpp. If your tool run in CUDA 13.1+, you can add it back. You can also use printf() in inject_func.cu if you are running your tool in CUDA 13.1. Otherwise, you will see CUDA_ERROR_INVALID_SOURCE.

NVBit-1.7.7

05 Feb 19:55
73092be

Choose a tag to compare

Changed

  • Updated CUDA headers to CUDA 13.1.
  • Removed kernel execution serialization in mem_trace tool and stopped using ASYNC_COPY_STREAM in channel.hpp by default. mem_trace tool now uses new NVBit APIs to load and launch CUDA functions used in the tool.

Added

  • Added tmem address parsing.
  • Added printf and assert support back (require CUDA 13.1 and newer toolkit and driver).
  • Added nvbit_load_tool_module() for loading a module that contains CUDA functions used by a tool (e.g., flush_channel() used in mem_trace.so was loaded implicitly). This avoids potential tool deadlocks.
  • Added nvbit_find_function_by_name() for getting CUfunction from a loaded tool module.
  • Added nvbit_launch_kernel() for launching a CUfunction.

Fixed

  • Fixed hangs in mem_trace if a context does not launch any kernel.

NVBit-1.7.6

23 Sep 01:23
73092be

Choose a tag to compare

Changed

  • Updated CUDA headers to CUDA 13.0

Added

  • Added SM_110 support
  • Added nvbit_dump_cubin() for tools to inspect a function's cubin file (Note: on Hopper and newer GPUs, line info can still be retrieved by disassembling the dumped cubin with nvdisasm when nvbit_get_line_info() does not work).

Fixed

  • Fixed a bug related to warpsync.collective instruction.
  • Fixed an issue causing nvbit_get_line_info() fails (Note: direct support for Hopper and newer GPUs is not available; use manual disassembly of cubins instead).

NVBit-1.7.5

02 Jun 16:00
e4fbe34

Choose a tag to compare

Announcement

We are working to enhance NVBit development and gain insights into its user base to better estimate the additional resources needed. Please take a moment to fill out this survey: https://forms.cloud.microsoft/r/zd1Kx3g8iQ and share it with any NVBit users you know. Your input is greatly appreciated—thank you!

Changelog

Fixed

  1. Fixed CALL.REL.NOINC handling (#142)
  2. Fixed a patch function argument passing issue.
  3. Fixed race condition for multithreaded CUDA program. NVBit serializes all kernel launches.
  4. Stop CUDA event callback for any CUDA APIs used inside NVBit tools.
  5. Fixed nvbit_tool_init() so that it is called once for each context.
  6. Fixed NVBit to present the same code as nvdisasm (#149)
  7. Fixed SASS string decoding issue (#148)

Changed

  1. Used a new way of getting related functions.
  2. Updated CUDA headers to CUDA 12.9

NVBit-1.7.4

11 Feb 20:25
ff94852

Choose a tag to compare

Added

  • Added SM_120 support

Changed

  • nvbit_get_kernel_argument_sizes(), nvbit_get_func_addr(), and nvbit_get_func_config() now require CUcontext as an input.

Fixed

  • Fixed the issue which prevents per context tool initialization (#140).

NVBit-1.7.3

21 Jan 16:28
ff94852

Choose a tag to compare

  1. Fixed the multi-context issue in #137
  2. Fixed a related function discovery crash.
  3. Fixed nvbit_read/write_(u)reg() functions, which might not work in certain conditions.
  4. Updated read_write_regs and record_reg_vals tools to avoid deadlocks.

NVBit-1.7.2

13 Dec 15:35
ff94852

Choose a tag to compare

  1. [API change] nvbit_set_at_launch(CUcontext ctx, CUfunction func, uint64_t param_val, CUstream custream = nullptr,uint64_t launch_handle = 0) now accepts parameter value instead of a pointer to the parameter. The newly added custream and launch_handle are provided and used during nvbit_at_graph_node_launch() to help set the parameter for CUDA graph kernel node.
  2. Improved cubin compatibility
  3. Fixed SASS instruction parsing
  4. Improved CUDA graph support
  5. [experimental] Changed mem_trace to support CUDA graph.
  6. Fixed related function detection for the function pointer case.

NVBit-1.7.1

29 Aug 15:15
ff94852

Choose a tag to compare

  1. Improved CUDA program compatibility
  2. Fixed related function discovery on SM80 (close #129).
  3. Updated license headers.