Skip to content

[cDAC] Add data descriptor field type annotations#126163

Merged
max-charlamb merged 4 commits intomainfrom
dev/max-charlamb-/runtime-cdac-typing
Apr 3, 2026
Merged

[cDAC] Add data descriptor field type annotations#126163
max-charlamb merged 4 commits intomainfrom
dev/max-charlamb-/runtime-cdac-typing

Conversation

@max-charlamb
Copy link
Copy Markdown
Member

@max-charlamb max-charlamb commented Mar 26, 2026

Summary

Add a type system to the cDAC data descriptor infrastructure. Field types in CDAC_TYPE_FIELD that were previously specified as C comments (/*uint32*/) are now expressed as preprocessor defines (T_UINT32, T_POINTER, TYPE(GCHandle), etc.).

Changes

Native-side type defines (wrappeddatadescriptor.inc):

  • Primitive types: T_UINT8, T_UINT16, T_UINT32, T_UINT64, T_INT8...T_INT64, T_NUINT, T_NINT, T_POINTER, T_BOOL
  • Struct types: TYPE(name) for types declared with CDAC_TYPE_BEGIN in the same descriptor
  • Cross-descriptor types: EXTERN_TYPE(name) (not validated locally)
  • Array types: T_ARRAY(type) (expands to pointer in the blob)
  • In debug/checked builds, these expand to type name strings in the blob. In release builds, they expand to nothing.

Compile-time validation (cdactypevalidation.inc):

  • In debug builds, validates that every TYPE(name) reference matches a CDAC_TYPE_BEGIN declaration via static_assert.
  • Uses two passes: first collects all declared types, then validates references. Handles forward references.

Descriptor updates

  • VM datadescriptor.inc: All ~200 fields and ~50 globals converted to type defines
  • GC datadescriptor.inc: All fields and globals converted, with EXTERN_TYPE(GCAllocContext) for the one cross-descriptor reference

Note

This content was generated with the assistance of GitHub Copilot.

Copilot AI review requested due to automatic review settings March 26, 2026 18:59
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@max-charlamb max-charlamb force-pushed the dev/max-charlamb-/runtime-cdac-typing branch from 6666d12 to 4d008c4 Compare March 30, 2026 17:12
@github-actions

This comment has been minimized.

@max-charlamb max-charlamb force-pushed the dev/max-charlamb-/runtime-cdac-typing branch from 4d008c4 to 0ae4fe8 Compare March 30, 2026 18:17
@github-actions

This comment has been minimized.

@max-charlamb max-charlamb force-pushed the dev/max-charlamb-/runtime-cdac-typing branch from 0ae4fe8 to e259293 Compare March 30, 2026 20:21
@github-actions

This comment has been minimized.

@max-charlamb max-charlamb force-pushed the dev/max-charlamb-/runtime-cdac-typing branch from e259293 to 62dcc09 Compare March 30, 2026 20:32
@github-actions

This comment has been minimized.

@max-charlamb max-charlamb force-pushed the dev/max-charlamb-/runtime-cdac-typing branch from 62dcc09 to d62d46b Compare March 30, 2026 22:44
@github-actions

This comment has been minimized.

@max-charlamb max-charlamb force-pushed the dev/max-charlamb-/runtime-cdac-typing branch from d62d46b to 3cfcc06 Compare March 31, 2026 02:17
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@max-charlamb max-charlamb force-pushed the dev/max-charlamb-/runtime-cdac-typing branch from 02ddfa0 to 2146118 Compare March 31, 2026 14:15
@github-actions

This comment has been minimized.

@max-charlamb max-charlamb force-pushed the dev/max-charlamb-/runtime-cdac-typing branch from 2146118 to 26a151e Compare March 31, 2026 16:26
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@max-charlamb max-charlamb force-pushed the dev/max-charlamb-/runtime-cdac-typing branch from cc78c3f to 11747ec Compare April 2, 2026 15:26
@github-actions

This comment has been minimized.

@max-charlamb max-charlamb force-pushed the dev/max-charlamb-/runtime-cdac-typing branch from 11747ec to e0cafde Compare April 2, 2026 17:34
@github-actions

This comment has been minimized.

Max Charlamb and others added 3 commits April 2, 2026 22:22
Remove documentation for CDAC_DATADESCRIPTOR_INC override mechanism that was
never implemented in wrappeddatadescriptor.inc. The include always uses
'datadescriptor.inc' resolved via build-configured include directories.

Fix comment typo in datadescriptor.cpp: T(GCHandle) -> TYPE(GCHandle).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use individual #ifndef guards for each type annotation define, with
a single #ifdef _DEBUG / #else at the top level. This allows callers
to override individual type defines without affecting others, matching
the pattern used by the CDAC macros below.

Update EXTERN_TYPE documentation to clarify it can be used for
well-known types that don't have a CDAC_TYPE_BEGIN in the current
descriptor, not just cross-descriptor types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb force-pushed the dev/max-charlamb-/runtime-cdac-typing branch from e0cafde to e70025e Compare April 3, 2026 02:22
…e docs

- Change SoftwareExceptionFrame, DebuggerEval, ResumableFrame, and
  FaultingExceptionFrame TargetContext fields from T_POINTER to
  EXTERN_TYPE(Context)
- Update EXTERN_TYPE docs in cdactypevalidation.inc and README.md to
  mention well-known types alongside cross-descriptor types

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb force-pushed the dev/max-charlamb-/runtime-cdac-typing branch from e70025e to c979818 Compare April 3, 2026 02:22
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Note

This review was generated by Copilot.

🤖 Copilot Code Review — PR #126163

Holistic Assessment

Motivation: The PR replaces unstructured comment-based type annotations (/*uint32*/, /*pointer*/) in cDAC data descriptors with formal preprocessor-based type defines (T_UINT32, T_POINTER, TYPE(name), EXTERN_TYPE(name)). This is well-justified — comment-based types were invisible to the compiler, providing no validation and being prone to staleness. Making types machine-readable and compile-time-validated is a clear improvement.

Approach: The approach is sound and consistent with the existing macro-heavy data descriptor infrastructure. The type defines are introduced in wrappeddatadescriptor.inc with #ifndef guards that allow override (used by cdactypevalidation.inc). The tag-struct + static_assert validation technique is clever and effective. The TYPE vs EXTERN_TYPE distinction for validated vs. cross-descriptor references is well-designed. The string splitting fix in ContractDescriptorSourceFileEmitter.cs is a reasonable preemptive fix for MSVC limits.

Summary: ✅ LGTM. The PR is a well-executed infrastructure improvement. All type conversions are consistent with the original comment annotations. The compile-time validation logic is correct. No blocking issues were found. One behavioral observation and one minor suggestion are noted below.


Detailed Findings

✅ Type Conversion Correctness — All 600+ field/global entries correctly converted

Verified that comment-based types were correctly mapped to macro defines:

  • /*uint32*/T_UINT32, /*pointer*/T_POINTER, /*nuint*/T_NUINT, /*int32*/T_INT32, /*uint8*/T_UINT8, etc.
  • Inline struct types like /*GCHandle*/TYPE(GCHandle), /*oom_history*/TYPE(OomHistory).
  • All 20 TYPE(X) references in the VM descriptor match corresponding CDAC_TYPE_BEGIN(X) entries.
  • Both GC EXTERN_TYPE(GCAllocContext) and VM EXTERN_TYPE(Context) references are semantically correct — GCAllocContext is defined in the VM descriptor (not GC), and Context is a platform-specific type without a CDAC_TYPE_BEGIN.

✅ Compile-Time Validation — cdactypevalidation.inc is correct

The two-pass approach works correctly:

  • Pass 1: CDAC_TYPE_BEGIN(name) declares struct cdac_type_tag_<name> { char dummy; };
  • Pass 2: TYPE(name) expands to struct cdac_type_tag_<name>, and static_assert(sizeof(...) > 0) fails at compile time if the tag struct was never declared.
  • Primitive types (T_UINT8, etc.) and EXTERN_TYPE correctly expand to char during validation, always passing sizeof.
  • T_ARRAY(inner) passes through to its inner type, correctly validating nested TYPE() references while ignoring primitive arrays.
  • All validation overrides are properly #undef'd after the two passes, allowing the real defines from wrappeddatadescriptor.inc to be established via the #ifndef guards.

✅ Macro Hygiene — wrappeddatadescriptor.inc defines are well-structured

  • Each type define uses an independent #ifndef guard, allowing selective override (requested during PR review, correctly implemented).
  • The defines persist across multiple wrappeddatadescriptor.inc inclusions (they are NOT #undef'd in the wrapper's cleanup section), which is correct since CDAC macros (CDAC_TYPE_FIELD, etc.) are #undef'd and redefined per pass, but type defines remain constant.
  • T_ARRAY(name) correctly expands to pointer in the production defines (arrays are accessed via pointers).

CDAC_STRINGIFY_TYPE — Correct double-indirection stringification

The CDAC_STRINGIFY_IMPL/CDAC_STRINGIFY pair correctly handles macro expansion before stringification. Direct #membertyname would stringify the macro name (e.g., "T_UINT32") rather than its expansion ("uint32"). The double-indirection ensures the type define is expanded first.

✅ String Splitting in ContractDescriptorSourceFileEmitter.cs

  • The MSVC 2048-byte string literal limit (error C2026) is correctly handled by splitting into adjacent string literals at 2000-character boundaries.
  • The escape sequence boundary check (escaped[chunkEnd - 1] == '\\') is safe because the only escaping performed is "\" — no double-backslash sequences can appear, so any trailing \ is guaranteed to be part of a \" pair.
  • No infinite loop risk: MaxChunkSize is 2000, so chunkEnd starts at offset + 2000 minimum. The backup reduces it by at most 1, ensuring progress every iteration.

ResumableFrame.TargetContextPtr correctly uses T_POINTER

The commit message mentions changing ResumableFrame's context field to EXTERN_TYPE(Context), but ResumableFrame.TargetContextPtr is a pointer to a Context (not an embedded Context), so T_POINTER is the correct type. SoftwareExceptionFrame.TargetContext, DebuggerEval.TargetContext, and FaultingExceptionFrame.TargetContext are embedded Context structs, correctly using EXTERN_TYPE(Context).

💡 Release Build Behavioral Change for Global Type Names — Intentional but worth noting

Previously, CDAC_GLOBAL type names were plain identifiers (e.g., uint32) that were stringified directly with #tyname, producing the type name in the blob in all build configurations. Now, CDAC_STRINGIFY_TYPE emits "" in release builds. This means release build blobs no longer contain global type names (e.g., "uint32" becomes "").

This is explicitly documented in the code comment ("to keep the blob compact") and appears intentional. The cDAC reader (ContractDescriptorTarget.cs) does not appear to make runtime decisions based on these type names, and field type names were already always empty in the old code (since they were in comments). This change makes the behavior consistent across fields and globals. No action needed, but worth being aware of if diagnostic tooling is later extended to use global type names.

💡 No Unit Tests for String Splitting — Pre-existing gap

The SetJsonDescriptor string splitting logic in ContractDescriptorSourceFileEmitter.cs has no dedicated unit tests. This is a pre-existing gap (the entire cdac-build-tool project has no test project). The algorithm is simple and correct, but future changes could benefit from test coverage. This is a follow-up consideration, not a blocker.


Models contributing to this review: Claude Opus 4.6 (primary). GPT-5.4 sub-agent did not complete within the timeout window.

Generated by Code Review for issue #126163 ·

@max-charlamb max-charlamb merged commit ee1b5b5 into main Apr 3, 2026
110 of 113 checks passed
@max-charlamb max-charlamb deleted the dev/max-charlamb-/runtime-cdac-typing branch April 3, 2026 15:03
radekdoulik pushed a commit to radekdoulik/runtime that referenced this pull request Apr 9, 2026
## Summary

Add a type system to the cDAC data descriptor infrastructure. Field
types in `CDAC_TYPE_FIELD` that were previously specified as C comments
(`/*uint32*/`) are now expressed as preprocessor defines (`T_UINT32`,
`T_POINTER`, `TYPE(GCHandle)`, etc.).

### Changes

**Native-side type defines** (`wrappeddatadescriptor.inc`):
- Primitive types: `T_UINT8`, `T_UINT16`, `T_UINT32`, `T_UINT64`,
`T_INT8`...`T_INT64`, `T_NUINT`, `T_NINT`, `T_POINTER`, `T_BOOL`
- Struct types: `TYPE(name)` for types declared with `CDAC_TYPE_BEGIN`
in the same descriptor
- Cross-descriptor types: `EXTERN_TYPE(name)` (not validated locally)
- Array types: `T_ARRAY(type)` (expands to `pointer` in the blob)
- In debug/checked builds, these expand to type name strings in the
blob. In release builds, they expand to nothing.

**Compile-time validation** (`cdactypevalidation.inc`):
- In debug builds, validates that every `TYPE(name)` reference matches a
`CDAC_TYPE_BEGIN` declaration via `static_assert`.
- Uses two passes: first collects all declared types, then validates
references. Handles forward references.

### Descriptor updates
- VM `datadescriptor.inc`: All ~200 fields and ~50 globals converted to
type defines
- GC `datadescriptor.inc`: All fields and globals converted, with
`EXTERN_TYPE(GCAllocContext)` for the one cross-descriptor reference

> [!NOTE]
> This content was generated with the assistance of GitHub Copilot.

---------

Co-authored-by: Max Charlamb <maxcharlamb@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max-charlamb added a commit that referenced this pull request Apr 10, 2026
## Summary

Add typed field extension methods for the cDAC managed reader and
convert all `IData<T>` classes to use them. This builds on the native
data descriptor type system (#126163) to add managed-side type
validation.

### Changes

**`TargetFieldExtensions.cs`** — New extension methods on `Target`:
- `ReadField<T>` — reads a primitive field with `Debug.Assert` type
validation
- `ReadPointerField` / `ReadPointerFieldOrNull` — reads pointer fields
(required/optional)
- `ReadNUIntField` — reads native unsigned integer fields
- `ReadCodePointerField` — reads code pointer fields
- `ReadDataField<T>` / `ReadDataFieldPointer<T>` — reads
inline/pointer-to Data struct fields
- `ReadFieldOrDefault<T>` — reads optional primitive fields with default
value

When the data descriptor includes type information (debug/checked
builds), these methods assert that the declared field type is compatible
with the C# read type.

**All ~130 `IData<T>` classes converted** to use the new extension
methods, replacing direct `target.Read<T>(address +
(ulong)type.Fields[...].Offset)` calls.

### Bugs found and fixed

The type system caught several pre-existing issues:
1. **DynamicStaticsInfo.GCStatics/NonGCStatics** — descriptor said
`uint32`, native type is `TADDR` (pointer)
2. **EEClass.FieldDescList** — used `Read<ulong>` for a pointer field
(broken on 32-bit)
3. **ExceptionClause.ClassToken** — read `TypeHandle` (nuint) as `uint`
4. **SimpleComCallWrapper.RefCount** — property was `ulong`, native is
`LONGLONG` (signed)
5. **NativeCodeVersionNode.NativeCode** — descriptor was `T_POINTER`,
actually `CodePointer`
6. **MethodTable.EEClassOrCanonMT** — descriptor was `T_NUINT`, managed
reads as pointer

> [!NOTE]
> This content was generated with the assistance of GitHub Copilot.

---------

Co-authored-by: Max Charlamb <maxcharlamb@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants