Skip to content

[lldb] Use llvm::APInt for VariantMember Discriminants#188487

Open
Walnut356 wants to merge 2 commits into
llvm:mainfrom
Walnut356:128bit-discr
Open

[lldb] Use llvm::APInt for VariantMember Discriminants#188487
Walnut356 wants to merge 2 commits into
llvm:mainfrom
Walnut356:128bit-discr

Conversation

@Walnut356

@Walnut356 Walnut356 commented Mar 25, 2026

Copy link
Copy Markdown

resolves #177812

It currently uses a uint32_t even though Rust can output 64- and 128-bit discriminants. This became a more obvious issue when Rust started niche-optimizing based on the capacity of strings (which are NoHighBit values, guaranteed to be less than u64::MAX / 2), rather than the NonNull heap pointer. With this optimization, the None variant of strings is 9223372036854775808, which LLDB truncates to 0. This means whenever the capacity is 0 (e.g. Some(String::new()), LLDB will mistakenly read it as the None variant. Additionally, when trying to determine the None variant, lldb will read the discr from memory correctly (9223372036854775808), and the visualizer scripts will compare that to the value in the variant name ($variant$0), see that it doesn't match, and incorrectly decides that it must not be the None variant.

This patch simply swaps the uint32_t with an llvm::APInt.

Sample program:

#[repr(u128)]
enum BigDiscr {
    Value(u64),
    None = 0x16151413121110090807060504030201,
}

fn main() {
    let big_discr_none = BigDiscr::None;
    let big_disr_some = BigDiscr::Value(31);

    let some_string = Some(String::from("asdf"));
    let some_empty_string = Some(String::new());
    let none_string: Option<String> = Option::None;

    println!("Hello world!"); // break
}

New output without visualizers:

(sample::BigDiscr) big_discr_none = {
  $variants$ = {
    $variant$0 = {
      $discr$ = 29352461300415899028694309177919734273
      value = (__0 = 0)
    }
    $variant$29352461300415899028694309177919734273 = ($discr$ = 29352461300415899028694309177919734273, value = sample::BigDiscr::None:128 @ 0x000000710a7ff7d0)
  }
}
(sample::BigDiscr) big_disr_some = {
  $variants$ = {
    $variant$0 = {
      $discr$ = 0
      value = (__0 = 31)
    }
    $variant$29352461300415899028694309177919734273 = ($discr$ = 0, value = sample::BigDiscr::None:128 @ 0x000000710a7ff7f0)
  }
}
(core::option::Option<alloc::string::String>) some_string = {
  $variants$ = {
    $variant$9223372036854775808 = ($discr$ = 4, value = core::option::Option<alloc::string::String>::None<alloc::string::String>:64 @ 0x000000710a7ff750)
    $variant$ = {
      value = {
        __0 = {
          vec = {...}
        }
      }
    }
  }
}
(core::option::Option<alloc::string::String>) some_empty_string = {
  $variants$ = {
    $variant$9223372036854775808 = ($discr$ = 0, value = core::option::Option<alloc::string::String>::None<alloc::string::String>:64 @ 0x000000710a7ff780)
    $variant$ = {
      value = {
        __0 = {
          vec = {...}
        }
      }
    }
  }
}
(core::option::Option<alloc::string::String>) none_string = {
  $variants$ = {
    $variant$9223372036854775808 = ($discr$ = 9223372036854775808, value = core::option::Option<alloc::string::String>::None<alloc::string::String>:64 @ 0x000000710a7ff7b8)
    $variant$ = {
      value = {
        __0 = {
          vec = {...}
        }
      }
    }
  }
}

New output with Rust's current visualizers:

# this one is incorrect because the visualizer scripts are wrong, not because of LLDB.
# the scripts use `GetValue()`, which returns a 64 bit number. They can be modified
# on Rust's end to use `GetData()` instead. The rest work as expected
(sample::BigDiscr) big_discr_none = Value(0) {
  0 = 0
}
(sample::BigDiscr) big_disr_some = Value(31) {
  0 = 31
}
(core::option::Option<alloc::string::String>) some_string = Some("asdf") {
  0 = "asdf" {
    [0] = 'a'
    [1] = 's'
    [2] = 'd'
    [3] = 'f'
  }
}
(core::option::Option<alloc::string::String>) some_empty_string = Some("") {
  0 = ""
}

@Walnut356 Walnut356 requested a review from JDevlieghere as a code owner March 25, 2026 13:55
@github-actions

Copy link
Copy Markdown

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the lldb label Mar 25, 2026
@llvmbot

llvmbot commented Mar 25, 2026

Copy link
Copy Markdown
Member

@llvm/pr-subscribers-lldb

Author: Walnut (Walnut356)

Changes

resolves #177812

It currently uses a uint32_t even though Rust can output 64- and 128-bit discriminants. This became a more obvious issue when Rust started niche-optimizing based on the capacity of strings (which are NoHighBit values, guaranteed to be less than u64::MAX / 2), rather than the NonNull heap pointer. With this optimization, the None variant of strings is 9223372036854775808, which LLDB truncates to 0. This means whenever the capacity is 0 (e.g. Some(String::new()), LLDB will mistakenly read it as the None variant. Additionally, when trying to determine the None variant, lldb will read the discr from memory correctly (9223372036854775808), and the visualizer scripts will compare that to the value in the variant name ($variant$0), see that it doesn't match, and incorrectly decides that it must not be the None variant.

This patch simply swaps the uint32_t with an llvm::APInt.

Sample program:

#[repr(u128)]
enum BigDiscr {
    Value(u64),
    None = 0x16151413121110090807060504030201,
}

fn main() {
    let big_discr_none = BigDiscr::None;
    let big_disr_some = BigDiscr::Value(31);

    let some_string = Some(String::from("asdf"));
    let some_empty_string = Some(String::new());
    let none_string: Option&lt;String&gt; = Option::None;

    println!("Hello world!"); // break
}

New output without visualizers:

(sample::BigDiscr) big_discr_none = {
  $variants$ = {
    $variant$0 = {
      $discr$ = 29352461300415899028694309177919734273
      value = (__0 = 0)
    }
    $variant$29352461300415899028694309177919734273 = ($discr$ = 29352461300415899028694309177919734273, value = sample::BigDiscr::None:128 @ 0x000000710a7ff7d0)
  }
}
(sample::BigDiscr) big_disr_some = {
  $variants$ = {
    $variant$0 = {
      $discr$ = 0
      value = (__0 = 31)
    }
    $variant$29352461300415899028694309177919734273 = ($discr$ = 0, value = sample::BigDiscr::None:128 @ 0x000000710a7ff7f0)
  }
}
(core::option::Option&lt;alloc::string::String&gt;) some_string = {
  $variants$ = {
    $variant$9223372036854775808 = ($discr$ = 4, value = core::option::Option&lt;alloc::string::String&gt;::None&lt;alloc::string::String&gt;:64 @ 0x000000710a7ff750)
    $variant$ = {
      value = {
        __0 = {
          vec = {...}
        }
      }
    }
  }
}
(core::option::Option&lt;alloc::string::String&gt;) some_empty_string = {
  $variants$ = {
    $variant$9223372036854775808 = ($discr$ = 0, value = core::option::Option&lt;alloc::string::String&gt;::None&lt;alloc::string::String&gt;:64 @ 0x000000710a7ff780)
    $variant$ = {
      value = {
        __0 = {
          vec = {...}
        }
      }
    }
  }
}
(core::option::Option&lt;alloc::string::String&gt;) none_string = {
  $variants$ = {
    $variant$9223372036854775808 = ($discr$ = 9223372036854775808, value = core::option::Option&lt;alloc::string::String&gt;::None&lt;alloc::string::String&gt;:64 @ 0x000000710a7ff7b8)
    $variant$ = {
      value = {
        __0 = {
          vec = {...}
        }
      }
    }
  }
}

New output with Rust's current visualizers:

# this one is incorrect because the visualizer scripts are wrong, not because of LLDB.
# the scripts use `GetValue()`, which returns a 64 bit number. They can be modified
# on Rust's end to use `GetData()` instead. The rest work as expected
(sample::BigDiscr) big_discr_none = Value(0) {
  0 = 0
}
(sample::BigDiscr) big_disr_some = Value(31) {
  0 = 31
}
(core::option::Option&lt;alloc::string::String&gt;) some_string = Some("asdf") {
  0 = "asdf" {
    [0] = 'a'
    [1] = 's'
    [2] = 'd'
    [3] = 'f'
  }
}
(core::option::Option&lt;alloc::string::String&gt;) some_empty_string = Some("") {
  0 = ""
}

Full diff: https://github.com/llvm/llvm-project/pull/188487.diff

1 Files Affected:

  • (modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp (+35-6)
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
index cb33fc21bfba9..72b1fcaa65174 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
@@ -13,6 +13,7 @@
 #include "DWARFDebugInfo.h"
 #include "DWARFDeclContext.h"
 #include "DWARFDefines.h"
+#include "DWARFFormValue.h"
 #include "SymbolFileDWARF.h"
 #include "SymbolFileDWARFDebugMap.h"
 #include "SymbolFileDWARFDwo.h"
@@ -2568,7 +2569,7 @@ struct VariantMember {
   explicit VariantMember(DWARFDIE &die, ModuleSP module_sp);
   bool IsDefault() const;
 
-  std::optional<uint32_t> discr_value;
+  std::optional<llvm::APInt> discr_value;
   DWARFFormValue type_ref;
   ConstString variant_name;
   uint32_t byte_offset;
@@ -2596,8 +2597,31 @@ bool VariantMember::IsDefault() const { return !discr_value; }
 
 VariantMember::VariantMember(DWARFDIE &die, lldb::ModuleSP module_sp) {
   assert(die.Tag() == llvm::dwarf::DW_TAG_variant);
-  this->discr_value =
-      die.GetAttributeValueAsOptionalUnsigned(DW_AT_discr_value);
+
+  DWARFFormValue discr_form;
+  die.GetDIE()->GetAttributeValue(die.GetCU(), DW_AT_discr_value, discr_form);
+
+  // rust can output 128-bit discrs (e.g. NonNull<u128>) as `DW_FORM_block1`.
+  // there is a `data16`, but the DIE function treats it as a block anyway.
+  // Handling is included for it just in case rust's output changes to the
+  // `data16` version
+  dw_form_t form = discr_form.Form();
+  if ((form == DW_FORM_block1 && discr_form.Unsigned() == 16) ||
+      form == DW_FORM_data16) {
+    const uint8_t *block_data = discr_form.BlockData();
+    uint64_t len = discr_form.Unsigned();
+    llvm::ArrayRef<uint64_t> data =
+        llvm::ArrayRef(reinterpret_cast<const uint64_t *>(block_data), 2);
+    this->discr_value = llvm::APInt(sizeof(uint64_t) * 2 * 8, data);
+  } else {
+    auto result = die.GetAttributeValueAsOptionalUnsigned(DW_AT_discr_value);
+    if (result.has_value()) {
+      this->discr_value =
+          llvm::APInt(sizeof(uint64_t) * 8, result.value(), false);
+    } else {
+      this->discr_value = std::nullopt;
+    };
+  }
 
   for (auto child_die : die.children()) {
     switch (child_die.Tag()) {
@@ -3834,9 +3858,14 @@ void DWARFASTParserClang::ParseRustVariantPart(
 
     m_ast.CompleteTagDeclarationDefinition(field_type);
 
-    auto name = has_discriminant
-                    ? llvm::formatv("$variant${0}", member.discr_value.value())
-                    : std::string("$variant$");
+    auto name = std::string("$variant$");
+    if (has_discriminant) {
+      // u128::MAX = 340282366920938463463374607431768211455 which is 39 digits
+      // long + 1 for null term
+      llvm::SmallString<40> discr_str{};
+      member.discr_value.value().toStringUnsigned(discr_str);
+      name.append(discr_str.c_str());
+    }
 
     auto variant_decl = m_ast.AddFieldToRecordType(
         inner_holder, llvm::StringRef(name), field_type, 0);

@github-actions

Copy link
Copy Markdown

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Developer Policy and LLVM Discourse for more information.

@github-actions

github-actions Bot commented Mar 25, 2026

Copy link
Copy Markdown

🐧 Linux x64 Test Results

  • 33398 tests passed
  • 529 tests skipped

✅ The build succeeded and all tests passed.

@Walnut356

Copy link
Copy Markdown
Author

@JDevlieghere ping

Comment on lines +2604 to +2607
// rust can output 128-bit discrs (e.g. NonNull<u128>) as `DW_FORM_block1`.
// there is a `data16`, but the DIE function treats it as a block anyway.
// Handling is included for it just in case rust's output changes to the
// `data16` version

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// rust can output 128-bit discrs (e.g. NonNull<u128>) as `DW_FORM_block1`.
// there is a `data16`, but the DIE function treats it as a block anyway.
// Handling is included for it just in case rust's output changes to the
// `data16` version
// Rust can output 128-bit discrs (e.g. NonNull<u128>) as `DW_FORM_block1`.
// There is a `data16`, but the DIE function treats it as a block anyway.
// Handling is included for it just in case rust's output changes to the
// `data16` version.

auto name = std::string("$variant$");
if (has_discriminant) {
// u128::MAX = 340282366920938463463374607431768211455 which is 39 digits
// long + 1 for null term

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// long + 1 for null term
// long + 1 for null terminator.


llvm::ArrayRef<uint64_t> data =
llvm::ArrayRef(reinterpret_cast<const uint64_t *>(block_data), 2);
this->discr_value = llvm::APInt(sizeof(uint64_t) * 2 * 8, data);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe use 16 here, to match the 16 on line 2609 and DW_FORM_data16.

Suggested change
this->discr_value = llvm::APInt(sizeof(uint64_t) * 2 * 8, data);
this->discr_value = llvm::APInt(sizeof(uint64_t) * 16, data);

Comment on lines +2617 to +2618
auto result = die.GetAttributeValueAsOptionalUnsigned(DW_AT_discr_value);
if (result.has_value()) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto result = die.GetAttributeValueAsOptionalUnsigned(DW_AT_discr_value);
if (result.has_value()) {
if (auto result = die.GetAttributeValueAsOptionalUnsigned(DW_AT_discr_value)) {

auto result = die.GetAttributeValueAsOptionalUnsigned(DW_AT_discr_value);
if (result.has_value()) {
this->discr_value =
llvm::APInt(sizeof(uint64_t) * 8, result.value(), false);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
llvm::APInt(sizeof(uint64_t) * 8, result.value(), false);
llvm::APInt(sizeof(uint64_t) * 8, result.value(), /*implicitTrunc=*/false);

if (has_discriminant) {
// u128::MAX = 340282366920938463463374607431768211455 which is 39 digits
// long + 1 for null term
llvm::SmallString<40> discr_str{};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
llvm::SmallString<40> discr_str{};
llvm::SmallString<40> discr_str;

const uint8_t *block_data = discr_form.BlockData();

llvm::ArrayRef<uint64_t> data =
llvm::ArrayRef(reinterpret_cast<const uint64_t *>(block_data), 2);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes the host and DWARF byte order match. I think you want something like:

    DataExtractor extractor(block_data, 16, die.GetCU()->GetByteOrder(),
                             die.GetCU()->GetAddressByteSize());
    lldb::offset_t offset = 0;
    uint64_t lo = extractor.GetU64(&offset);
    uint64_t hi = extractor.GetU64(&offset);
    uint64_t words[] = {lo, hi};
    this->discr_value = llvm::APInt(128, words);

@JDevlieghere

Copy link
Copy Markdown
Member

Would it be possible to add a test for this? Potentially by checking in a obj2yaml'd binary created by Rust?

@Walnut356

Walnut356 commented Apr 12, 2026

Copy link
Copy Markdown
Author

I'm not super familiar with LLDB's testing infrastructure

Coincidentally, i also figured out LLDB will straight up crash trying to handle 128-bit C-style enums e.g.:

#[repr(u128]
enum LargeDiscr {
	A = 0x15141312111009080706050403020100,
	B = 1,
}

I'll open an issue about it since it's a whole separate thing.

Would it be possible to add a test for this? Potentially by checking in a obj2yaml'd binary created by Rust?

I saw the tests in lldb\unittests\SymbolFile\DWARF\DWARFASTParserClangTests.cpp, but after fiddling with it for a few hours, I don't think I'm knowledgeable enough to write a test that actually compiles and tests what we'd want.

FWIW it doesn't look like any other part of the Rust VariantPart/VariantMember is tested either, and all we're really doing is swapping out an int type for another int type with a larger bit width.

@Nerixyz

Nerixyz commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

I'm not super familiar with LLDB's testing infrastructure

Tried hacking around to get a small binary:

#![no_std]
#![no_main]

#[allow(unused)]
#[repr(u128)]
enum BigDiscr {
    Value(u64),
    None = 0x16151413121110090807060504030201,
}

fn use_it(_: &BigDiscr) {}

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    loop {}
}

#[no_mangle]
extern "C" fn main() {
    let none = BigDiscr::None;
    let some = BigDiscr::Value(31);

    use_it(&none);
    use_it(&some);
}

Compiled with rustc main.rs -g -C panic=abort -C link-arg=-nostdlib on x86_64 Linux yields a 5.8K object file and 12K obj2yaml'd file.

@Walnut356

Copy link
Copy Markdown
Author

Compiled with rustc main.rs -g -C panic=abort -C link-arg=-nostdlib on x86_64 Linux yields a 5.8K object file and 12K obj2yaml'd file.

Ah yeah, the smallest I can get even with that is ~30k. I don't have ready access to Linux at the moment, and cross-compiling without --emit=obj gives me linker errors that I don't really want to deal with atm.

In terms of testing logic, I straight up don't know enough about DWARF to test with the kind of elegance that the other tests have. It looks like they're running individual tags through specific functions? I can hack something together with SymbolFile::FindTypes and inspect the type from there, but it won't be pretty.

That lack of DWARF knowledge also means I'm not sure what (if anything) i can manually rip out of the YAML and still have the test function. Would it be fine to just pop the whole file into the inputs folder with the other yaml files? I've had non-minimized files rejected before so I'd like to be sure.

Also, something I noticed while fiddling with the test data, is there any reason why TypeList advertises a FindTypes function despite the implementation being commented out?

@Nerixyz

Nerixyz commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

In terms of testing logic, I straight up don't know enough about DWARF to test with the kind of elegance that the other tests have. It looks like they're running individual tags through specific functions? I can hack something together with SymbolFile::FindTypes and inspect the type from there, but it won't be pretty.

You don't have to write a unit test for this (I think). Either a shell test (https://github.com/llvm/llvm-project/tree/main/lldb/test/Shell/SymbolFile/DWARF) or API test (https://github.com/llvm/llvm-project/tree/main/lldb/test/API; no DWARF specific directory afaik) should suffice. Usually you can start with a rather large YAML file and remove things you don't need. There are already some shell tests that use yaml2obj.

@Walnut356

Copy link
Copy Markdown
Author

You don't have to write a unit test for this (I think). Either a shell test (main/lldb/test/Shell/SymbolFile/DWARF) or API test (main/lldb/test/API; no DWARF specific directory afaik) should suffice.

I might be dumb but i have even less idea of how i would implement that. I cannot currently generate a complete x86_64-unknown-linux-gnu executable on my end (i don't have the build environment to properly link it and it's a pain in the ass to fix that atm). Unless i'm misunderstanding, a shell or API test would require something that LLDB can run?

Usually you can start with a rather large YAML file and remove things you don't need.

That's sorta my point. I don't know what I do or don't need because i have an incredibly limited understanding of how ELF and DWARF work.

@DavidSpickett DavidSpickett changed the title Use llvm::APInt for VariantMember Discriminants [lldb] Use llvm::APInt for VariantMember Discriminants Apr 14, 2026
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request May 9, 2026
…ulacrum

[Debug Info] Gracefully handle invalid `String`/`Vec`

Somewhat related to rust-lang#150392.

Currently the handling can throw an exception, which we should absolutely not do. It causes issues with debugger adapters (e.g. CodeLLDB will hang forever. Trying to stop the debugger via vscode's interface causes a CodeLLDB to leak memory constantly until RAM is depleted and the OS starts killing processes). The exception has been replaced with a printed error message and a placeholder value.

Additionally, if a String/Vec is in an "invalid" state due to niche optimization (`capacity >= (1 << 63)`, common with `Option<String>`/`Option<Vec<T>>`), the pointer and length values will be meaningless, but are not guaranteed to be 0'd. The debugger will happily proceed as if they are useful values, and often do things like \<try to read multiple GB of data from the debugee\>.

I added simple checks to ensure that the capacity and length are within bounds, and that the pointer is non-null. If any check fails, the string/vec just acts as if it's empty.

Eventually this problem will be solved on LLDB's end via llvm/llvm-project#188487 or similar, but preventing issues on our end in the short term will help a lot.
@Walnut356

Walnut356 commented May 11, 2026

Copy link
Copy Markdown
Author

@JDevlieghere this should be good to go

jhpratt added a commit to jhpratt/rust that referenced this pull request May 22, 2026
…ulacrum

[Debug Info] Gracefully handle invalid `String`/`Vec`

Somewhat related to rust-lang#150392.

Currently the handling can throw an exception, which we should absolutely not do. It causes issues with debugger adapters (e.g. CodeLLDB will hang forever. Trying to stop the debugger via vscode's interface causes a CodeLLDB to leak memory constantly until RAM is depleted and the OS starts killing processes). The exception has been replaced with a printed error message and a placeholder value.

Additionally, if a String/Vec is in an "invalid" state due to niche optimization (`capacity >= (1 << 63)`, common with `Option<String>`/`Option<Vec<T>>`), the pointer and length values will be meaningless, but are not guaranteed to be 0'd. The debugger will happily proceed as if they are useful values, and often do things like \<try to read multiple GB of data from the debugee\>.

I added simple checks to ensure that the capacity and length are within bounds, and that the pointer is non-null. If any check fails, the string/vec just acts as if it's empty.

Eventually this problem will be solved on LLDB's end via llvm/llvm-project#188487 or similar, but preventing issues on our end in the short term will help a lot.

---

try-job: aarch64-apple
rust-bors Bot pushed a commit to rust-lang/rust that referenced this pull request May 22, 2026
[Debug Info] Gracefully handle invalid `String`/`Vec`



Somewhat related to #150392. 

Currently the handling can throw an exception, which we should absolutely not do. It causes issues with debugger adapters (e.g. CodeLLDB will hang forever. Trying to stop the debugger via vscode's interface causes a CodeLLDB to leak memory constantly until RAM is depleted and the OS starts killing processes). The exception has been replaced with a printed error message and a placeholder value.

Additionally, if a String/Vec is in an "invalid" state due to niche optimization (`capacity >= (1 << 63)`, common with `Option<String>`/`Option<Vec<T>>`), the pointer and length values will be meaningless, but are not guaranteed to be 0'd. The debugger will happily proceed as if they are useful values, and often do things like \<try to read multiple GB of data from the debugee\>. 

I added simple checks to ensure that the capacity and length are within bounds, and that the pointer is non-null. If any check fails, the string/vec just acts as if it's empty.

Eventually this problem will be solved on LLDB's end via llvm/llvm-project#188487 or similar, but preventing issues on our end in the short term will help a lot.

---

try-job: aarch64-apple
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request May 22, 2026
…ulacrum

[Debug Info] Gracefully handle invalid `String`/`Vec`

Somewhat related to rust-lang#150392.

Currently the handling can throw an exception, which we should absolutely not do. It causes issues with debugger adapters (e.g. CodeLLDB will hang forever. Trying to stop the debugger via vscode's interface causes a CodeLLDB to leak memory constantly until RAM is depleted and the OS starts killing processes). The exception has been replaced with a printed error message and a placeholder value.

Additionally, if a String/Vec is in an "invalid" state due to niche optimization (`capacity >= (1 << 63)`, common with `Option<String>`/`Option<Vec<T>>`), the pointer and length values will be meaningless, but are not guaranteed to be 0'd. The debugger will happily proceed as if they are useful values, and often do things like \<try to read multiple GB of data from the debugee\>.

I added simple checks to ensure that the capacity and length are within bounds, and that the pointer is non-null. If any check fails, the string/vec just acts as if it's empty.

Eventually this problem will be solved on LLDB's end via llvm/llvm-project#188487 or similar, but preventing issues on our end in the short term will help a lot.

---

try-job: aarch64-apple
rust-bors Bot pushed a commit to rust-lang/rust that referenced this pull request May 22, 2026
[Debug Info] Gracefully handle invalid `String`/`Vec`



Somewhat related to #150392. 

Currently the handling can throw an exception, which we should absolutely not do. It causes issues with debugger adapters (e.g. CodeLLDB will hang forever. Trying to stop the debugger via vscode's interface causes a CodeLLDB to leak memory constantly until RAM is depleted and the OS starts killing processes). The exception has been replaced with a printed error message and a placeholder value.

Additionally, if a String/Vec is in an "invalid" state due to niche optimization (`capacity >= (1 << 63)`, common with `Option<String>`/`Option<Vec<T>>`), the pointer and length values will be meaningless, but are not guaranteed to be 0'd. The debugger will happily proceed as if they are useful values, and often do things like \<try to read multiple GB of data from the debugee\>. 

I added simple checks to ensure that the capacity and length are within bounds, and that the pointer is non-null. If any check fails, the string/vec just acts as if it's empty.

Eventually this problem will be solved on LLDB's end via llvm/llvm-project#188487 or similar, but preventing issues on our end in the short term will help a lot.

---

try-job: aarch64-apple
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request May 22, 2026
…ulacrum

[Debug Info] Gracefully handle invalid `String`/`Vec`

Somewhat related to rust-lang#150392.

Currently the handling can throw an exception, which we should absolutely not do. It causes issues with debugger adapters (e.g. CodeLLDB will hang forever. Trying to stop the debugger via vscode's interface causes a CodeLLDB to leak memory constantly until RAM is depleted and the OS starts killing processes). The exception has been replaced with a printed error message and a placeholder value.

Additionally, if a String/Vec is in an "invalid" state due to niche optimization (`capacity >= (1 << 63)`, common with `Option<String>`/`Option<Vec<T>>`), the pointer and length values will be meaningless, but are not guaranteed to be 0'd. The debugger will happily proceed as if they are useful values, and often do things like \<try to read multiple GB of data from the debugee\>.

I added simple checks to ensure that the capacity and length are within bounds, and that the pointer is non-null. If any check fails, the string/vec just acts as if it's empty.

Eventually this problem will be solved on LLDB's end via llvm/llvm-project#188487 or similar, but preventing issues on our end in the short term will help a lot.

---

try-job: aarch64-apple
jollaitbot pushed a commit to sailfishos-mirror/rust that referenced this pull request May 23, 2026
Rollup merge of #155509 - Walnut356:string_crash, r=Mark-Simulacrum

[Debug Info] Gracefully handle invalid `String`/`Vec`

Somewhat related to rust-lang/rust#150392.

Currently the handling can throw an exception, which we should absolutely not do. It causes issues with debugger adapters (e.g. CodeLLDB will hang forever. Trying to stop the debugger via vscode's interface causes a CodeLLDB to leak memory constantly until RAM is depleted and the OS starts killing processes). The exception has been replaced with a printed error message and a placeholder value.

Additionally, if a String/Vec is in an "invalid" state due to niche optimization (`capacity >= (1 << 63)`, common with `Option<String>`/`Option<Vec<T>>`), the pointer and length values will be meaningless, but are not guaranteed to be 0'd. The debugger will happily proceed as if they are useful values, and often do things like \<try to read multiple GB of data from the debugee\>.

I added simple checks to ensure that the capacity and length are within bounds, and that the pointer is non-null. If any check fails, the string/vec just acts as if it's empty.

Eventually this problem will be solved on LLDB's end via llvm/llvm-project#188487 or similar, but preventing issues on our end in the short term will help a lot.

---

try-job: aarch64-apple
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[LLDB] DWARFASTParserClang handles DW_AT_variant incorrectly

4 participants