Skip to content

Fix lerp overload ambiguity with std::lerp under C++20#1985

Merged
crcrpar merged 1 commit intoNVIDIA:masterfrom
xwang233:fix/lerp-ambiguity-cpp20
Mar 10, 2026
Merged

Fix lerp overload ambiguity with std::lerp under C++20#1985
crcrpar merged 1 commit intoNVIDIA:masterfrom
xwang233:fix/lerp-ambiguity-cpp20

Conversation

@xwang233
Copy link
Copy Markdown
Contributor

@xwang233 xwang233 commented Mar 9, 2026

Summary

  • PyTorch pytorch/pytorch#176659 upgraded the default C++ standard from C++17 to C++20 for extensions built via torch.utils.cpp_extension
  • Under C++20, std::lerp from <cmath> becomes visible alongside the custom lerp(float, float, float) defined in multi_tensor_distopt_adam_kernel.cu
  • When template arguments involve c10::BFloat16 (implicitly convertible to float), the compiler finds two equally-valid overload candidates and fails with "more than one instance of overloaded function lerp matches the argument list"
  • Rename the custom lerp to _lerp to eliminate the ambiguity

Test plan

  • Verify apex builds successfully against PyTorch nightly (which now uses C++20 by default)
  • Verify apex still builds against older PyTorch versions using C++17

🤖 Generated with Claude Code

PyTorch commit ad56ff73b751 ("[2/12] Upgrade cpp_extension and
cpp_builder to C++20", pytorch/pytorch#176659) changed the default
C++ standard from C++17 to C++20 for extensions built via
torch.utils.cpp_extension. Under C++20, std::lerp from <cmath> is
visible alongside the custom lerp(float,float,float) defined in this
file. When the third argument is c10::BFloat16 (implicitly convertible
to float), the compiler finds two equally-valid overload candidates
and fails with "more than one instance of overloaded function matches".

Rename the custom lerp to _lerp to eliminate the ambiguity.

Signed-off-by: Xiao Wang <24860335+xwang233@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Resolves a C++20 build failure in the CUDA multi-tensor distributed Adam optimizer by avoiding overload ambiguity between a local lerp helper and std::lerp introduced via <cmath>.

Changes:

  • Rename the local device lerp helper to _lerp.
  • Update all call sites in DistAdamFunctor and DistAdamCapturableFunctor to use _lerp.
  • Add an explanatory comment about the C++20 std::lerp ambiguity.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 29 to 32
// (1-t)*x + t*y
__device__ __forceinline__ float lerp(float t, float x, float y) {
// Note: Named _lerp to avoid ambiguity with std::lerp under C++20.
__device__ __forceinline__ float _lerp(float t, float x, float y) {
// See https://developer.nvidia.com/blog/lerp-faster-cuda/
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper is renamed to _lerp, but identifiers beginning with an underscore at global scope are reserved in C++ and can conflict with compiler/libc++ internals. Please rename to a non-reserved internal name (e.g., fast_lerp, apex_lerp, or lerp_) or place it in a namespace/anonymous namespace and call it with a qualified name to avoid std::lerp ambiguity without using a reserved identifier.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah having this in an anonymous namespace sounds good

@xwang233
Copy link
Copy Markdown
Contributor Author

xwang233 commented Mar 9, 2026

internal pipeline 45725282 looks good so far

Copy link
Copy Markdown
Collaborator

@crcrpar crcrpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked this can be compiled with a week old pytorch successfully

@crcrpar crcrpar merged commit f199212 into NVIDIA:master Mar 10, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants