Skip to content

[VPlan] Embed widening semantics of recipes (NFCI)#195385

Closed
artagnon wants to merge 2 commits into
llvm:mainfrom
artagnon:vplan-wideninginfo
Closed

[VPlan] Embed widening semantics of recipes (NFCI)#195385
artagnon wants to merge 2 commits into
llvm:mainfrom
artagnon:vplan-wideninginfo

Conversation

@artagnon

@artagnon artagnon commented May 1, 2026

Copy link
Copy Markdown
Contributor

The goal is to make widening decisions in a principled fashion, simply by analyzing the Plan, and this patch is the first step towards the goal. By embedding widening semantics in recipes, we make explicit data that is otherwise non-trivial to infer from the recipe.

  1. Mutability versus immutability is orthogonal to keeping widening information in the recipe that would otherwise have to be inferred by reading the recipe's code. The goal is to eventually move to mutable widening information, although we should keep it immutable for now in view of the current state of the codebase.
  2. We do have to have to some un-narrowable recipes hard-coded: otherwise, what would the leaf cases of any narrowing analysis be? We'd just end up narrowing everything and producing the scalar loop again. It would be valuable to keep track of which recipes are narrowable, and embed this information directly in the recipe (Narrow bit).
  3. Currently, the unroller can only handle certain recipes, and we would have to extend it to handle more one by one. It would hence be valuable to keep track of which recipes the unroller supports, and embed the information directly into the recipe (ReplicatePart bit).
  4. Currently, Wide and ReplicatePart are exclusive, but there is no reason we cannot eventually decide to either widen or unroll a recipe via analysis. Again, we have to have to keep track of this information (Wide ^ ReplicatePart), and it would be good to ensure that the invariant is maintained until we break it. Ultimately, we want to unify usesFirst(Lane|Part)Only.
  5. We currently have only one ScalarToVector instruction (Broadcast), but several VectorToScalars (from VPInstruction and VPReductionRecipe). It would be useful to keep track of this information directly in the recipe (Agnostic bit), so we know what the IsScalar of VPTransformState::get needs to be. At the moment, VPTransformState introduces broadcasts and extracts just-in-time, and these are not visible in the VPlan, and not amenable to optimization: one of the goals of this line of work is to get rid of it completely. It does not make sense to replicate Agnostic recipes (!(Agnostic && ReplicatePart)), and it would be good to maintain this invariant.
  6. We could put the enum in VPInstruction, but we will at least be missing out information on VPPredInstPHI, VPReduction, VPBlend, and VPWiden(Call|Intrinsic|Load|Store), albeit unnecessarily.
  7. A planned addition is a VoidResult bit to faithfully capture the fact that stores and intrinsics like assumes do not produce results. This information would make couldProduce(Narrow|Wide)Result more faithful, would help us make replication decisions along with mayHaveSideEffects, and we would eliminate special cases. Currently, a recipe's defined values does not fully capture this, as Replicates are SingleDefs.
  8. We only have one case of MultiDefs (VPInterleave), and it needs a lot of special handling anyway, so we wouldn't gain much by modeling this.

@llvmorg-github-actions

llvmorg-github-actions Bot commented May 1, 2026

Copy link
Copy Markdown

@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Ramkumar Ramachandra (artagnon)

Changes

We currently make widening decisions in an ad-hoc fashion, and have helpers that unnecessarily do recursive reasoning over and over: isSingleScalar, onlyFirstLaneUsed, and onlyScalarValuesUsed. The goal is to make widening decisions in a principled fashion, simply by analyzing the Plan, and this patch is the first step towards the goal. By embedding widening decisions in recipes, which can be refined iteratively, we can eliminate the hard-coded wide-equivalents of VPInstruction, including VPWidenRecipe, VPWidenGEPRecipe, and VPWidenCastRecipe. The design we have picked would also allow us to eliminate VPReplicateRecipe.

This patch is intentionally non-functional, but it should have enough information to see what the follow-ups would be.


Patch is 40.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/195385.diff

7 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+1-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+100-53)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+13-11)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+58-68)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp (+2-5)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUtils.cpp (+34-17)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUtils.h (+6)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index e912751525fc7..e74726d90d690 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7848,7 +7848,7 @@ static SmallVector<Instruction *> preparePlanForEpilogueVectorLoop(
                 [](const VPUser *U) {
                   return isa<VPScalarIVStepsRecipe>(U) ||
                          isa<VPDerivedIVRecipe>(U) ||
-                         cast<VPRecipeBase>(U)->isScalarCast() ||
+                         cast<VPInstruction>(U)->isScalarCast() ||
                          cast<VPInstruction>(U)->getOpcode() ==
                              Instruction::Add;
                 }) &&
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 592d83af2295f..0d3755ff4d1fa 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -409,6 +409,9 @@ class LLVM_ABI_FOR_TEST VPRecipeBase
   /// Subclass identifier (for isa/dyn_cast).
   const unsigned char SubclassID;
 
+  /// Two bits of widening information, which takes values in VPWideningTy.
+  unsigned char WideningInfo : 2;
+
   /// Each VPRecipe belongs to a single VPBasicBlock.
   VPBasicBlock *Parent = nullptr;
 
@@ -466,9 +469,21 @@ class LLVM_ABI_FOR_TEST VPRecipeBase
     VPLastPHISC = VPReductionPHISC,
   };
 
-  VPRecipeBase(const unsigned char SC, ArrayRef<VPValue *> Operands,
+  /// An enumeration for keeping track of whether the widening status of the
+  /// recipe.
+  /// Narrow indicates that the recipe necessarily produces a narrow scalar.
+  /// PossiblyNarrow indicates that the recipe is wide, although it could be
+  /// narrowed to produce a scalar if profitable.
+  /// Wide indicates that the recipes necessarily produces a wide result.
+  /// ReplicatePart indicates that the recipe needs to be replicated, producing
+  /// scalars for each part, which the unroller handles.
+  using VPWideningTy = enum { Narrow, PossiblyNarrow, Wide, ReplicatePart };
+
+  VPRecipeBase(const unsigned char SC, unsigned char WideInfo,
+               ArrayRef<VPValue *> Operands,
                DebugLoc DL = DebugLoc::getUnknown())
-      : VPDef(), VPUser(Operands), SubclassID(SC), DL(DL) {}
+      : VPDef(), VPUser(Operands), SubclassID(SC), WideningInfo(WideInfo),
+        DL(DL) {}
 
   ~VPRecipeBase() override = default;
 
@@ -552,8 +567,16 @@ class LLVM_ABI_FOR_TEST VPRecipeBase
   /// Returns the debug location of the recipe.
   DebugLoc getDebugLoc() const { return DL; }
 
-  /// Return true if the recipe is a scalar cast.
-  bool isScalarCast() const;
+  /// Methods for querying and setting WideningInfo.
+  bool isNarrow() const { return WideningInfo == Narrow; }
+  void markNarrow() { WideningInfo = Narrow; }
+  bool maybeNarrow() const {
+    return is_contained({Narrow, PossiblyNarrow}, WideningInfo);
+  }
+  void markPossiblyNarrow() { WideningInfo = PossiblyNarrow; }
+  bool isWide() const { return WideningInfo == Wide; }
+  bool isReplicatePart() const { return WideningInfo == ReplicatePart; }
+  void markReplicatePart() { WideningInfo = ReplicatePart; }
 
   /// Set the recipe's debug location to \p NewDL.
   void setDebugLoc(DebugLoc NewDL) { DL = NewDL; }
@@ -604,13 +627,15 @@ class LLVM_ABI_FOR_TEST VPRecipeBase
 /// Note that VPRecipeBase must be inherited from before VPValue.
 class VPSingleDefRecipe : public VPRecipeBase, public VPRecipeValue {
 public:
-  VPSingleDefRecipe(const unsigned char SC, ArrayRef<VPValue *> Operands,
+  VPSingleDefRecipe(const unsigned char SC, unsigned char WideInfo,
+                    ArrayRef<VPValue *> Operands,
                     DebugLoc DL = DebugLoc::getUnknown())
-      : VPRecipeBase(SC, Operands, DL), VPRecipeValue(this) {}
+      : VPRecipeBase(SC, WideInfo, Operands, DL), VPRecipeValue(this) {}
 
-  VPSingleDefRecipe(const unsigned char SC, ArrayRef<VPValue *> Operands,
-                    Value *UV, DebugLoc DL = DebugLoc::getUnknown())
-      : VPRecipeBase(SC, Operands, DL), VPRecipeValue(this, UV) {}
+  VPSingleDefRecipe(const unsigned char SC, unsigned char WideInfo,
+                    ArrayRef<VPValue *> Operands, Value *UV,
+                    DebugLoc DL = DebugLoc::getUnknown())
+      : VPRecipeBase(SC, WideInfo, Operands, DL), VPRecipeValue(this, UV) {}
 
   static inline bool classof(const VPRecipeBase *R) {
     switch (R->getVPRecipeID()) {
@@ -1107,10 +1132,10 @@ static_assert(sizeof(VPIRFlags) <= 3, "VPIRFlags should not grow");
 /// A pure-virtual common base class for recipes defining a single VPValue and
 /// using IR flags.
 struct VPRecipeWithIRFlags : public VPSingleDefRecipe, public VPIRFlags {
-  VPRecipeWithIRFlags(const unsigned char SC, ArrayRef<VPValue *> Operands,
-                      const VPIRFlags &Flags,
+  VPRecipeWithIRFlags(const unsigned char SC, unsigned char WideInfo,
+                      ArrayRef<VPValue *> Operands, const VPIRFlags &Flags,
                       DebugLoc DL = DebugLoc::getUnknown())
-      : VPSingleDefRecipe(SC, Operands, DL), VPIRFlags(Flags) {}
+      : VPSingleDefRecipe(SC, WideInfo, Operands, DL), VPIRFlags(Flags) {}
 
   static inline bool classof(const VPRecipeBase *R) {
     return R->getVPRecipeID() == VPRecipeBase::VPBlendSC ||
@@ -1483,12 +1508,17 @@ class LLVM_ABI_FOR_TEST VPInstruction : public VPRecipeWithIRFlags,
 
   /// Returns true if this VPInstruction produces a scalar value from a vector,
   /// e.g. by performing a reduction or extracting a lane.
+  /// TODO: Remove in favor of WideningInfo.
   bool isVectorToScalar() const;
 
   /// Returns true if this VPInstruction's operands are single scalars and the
   /// result is also a single scalar.
+  /// TODO: Remove in favor of WideningInfo.
   bool isSingleScalar() const;
 
+  /// Returns true if this VPInstruction is a scalar cast.
+  bool isScalarCast() const { return Instruction::isCast(getOpcode()); }
+
   /// Returns the symbolic name assigned to the VPInstruction.
   StringRef getName() const { return Name; }
 
@@ -1524,8 +1554,6 @@ class VPInstructionWithType : public VPInstruction {
   static inline bool classof(const VPRecipeBase *R) {
     // VPInstructionWithType are VPInstructions with specific opcodes requiring
     // type information.
-    if (R->isScalarCast())
-      return true;
     auto *VPI = dyn_cast<VPInstruction>(R);
     if (!VPI)
       return false;
@@ -1536,7 +1564,7 @@ class VPInstructionWithType : public VPInstruction {
     case Instruction::Load:
       return true;
     default:
-      return false;
+      return VPI->isScalarCast();
     }
   }
 
@@ -1684,7 +1712,8 @@ class VPIRInstruction : public VPRecipeBase {
   /// VPIRInstruction::create() should be used to create VPIRInstructions, as
   /// subclasses may need to be created, e.g. VPIRPhi.
   VPIRInstruction(Instruction &I)
-      : VPRecipeBase(VPRecipeBase::VPIRInstructionSC, {}), I(I) {}
+      : VPRecipeBase(VPRecipeBase::VPIRInstructionSC, VPRecipeBase::Narrow, {}),
+        I(I) {}
 
 public:
   ~VPIRInstruction() override = default;
@@ -1780,7 +1809,8 @@ class LLVM_ABI_FOR_TEST VPWidenRecipe : public VPRecipeWithIRFlags,
   VPWidenRecipe(Instruction &I, ArrayRef<VPValue *> Operands,
                 const VPIRFlags &Flags = {}, const VPIRMetadata &Metadata = {},
                 DebugLoc DL = {})
-      : VPRecipeWithIRFlags(VPRecipeBase::VPWidenSC, Operands, Flags, DL),
+      : VPRecipeWithIRFlags(VPRecipeBase::VPWidenSC, VPRecipeBase::Wide,
+                            Operands, Flags, DL),
         VPIRMetadata(Metadata), Opcode(I.getOpcode()) {
     setUnderlyingValue(&I);
   }
@@ -1788,7 +1818,8 @@ class LLVM_ABI_FOR_TEST VPWidenRecipe : public VPRecipeWithIRFlags,
   VPWidenRecipe(unsigned Opcode, ArrayRef<VPValue *> Operands,
                 const VPIRFlags &Flags = {}, const VPIRMetadata &Metadata = {},
                 DebugLoc DL = {})
-      : VPRecipeWithIRFlags(VPRecipeBase::VPWidenSC, Operands, Flags, DL),
+      : VPRecipeWithIRFlags(VPRecipeBase::VPWidenSC, VPRecipeBase::Wide,
+                            Operands, Flags, DL),
         VPIRMetadata(Metadata), Opcode(Opcode) {}
 
   ~VPWidenRecipe() override = default;
@@ -1841,7 +1872,8 @@ class VPWidenCastRecipe : public VPRecipeWithIRFlags, public VPIRMetadata {
                     CastInst *CI = nullptr, const VPIRFlags &Flags = {},
                     const VPIRMetadata &Metadata = {},
                     DebugLoc DL = DebugLoc::getUnknown())
-      : VPRecipeWithIRFlags(VPRecipeBase::VPWidenCastSC, Op, Flags, DL),
+      : VPRecipeWithIRFlags(VPRecipeBase::VPWidenCastSC, VPRecipeBase::Wide, Op,
+                            Flags, DL),
         VPIRMetadata(Metadata), Opcode(Opcode), ResultTy(ResultTy) {
     assert(flagsValidForOpcode(Opcode) &&
            "Set flags not supported for the provided opcode");
@@ -1903,8 +1935,8 @@ class VPWidenIntrinsicRecipe : public VPRecipeWithIRFlags, public VPIRMetadata {
                          const VPIRFlags &Flags = {},
                          const VPIRMetadata &MD = {},
                          DebugLoc DL = DebugLoc::getUnknown())
-      : VPRecipeWithIRFlags(VPRecipeBase::VPWidenIntrinsicSC, CallArguments,
-                            Flags, DL),
+      : VPRecipeWithIRFlags(VPRecipeBase::VPWidenIntrinsicSC,
+                            VPRecipeBase::Wide, CallArguments, Flags, DL),
         VPIRMetadata(MD), VectorIntrinsicID(VectorIntrinsicID), ResultTy(Ty),
         MayReadFromMemory(CI.mayReadFromMemory()),
         MayWriteToMemory(CI.mayWriteToMemory()),
@@ -1917,8 +1949,8 @@ class VPWidenIntrinsicRecipe : public VPRecipeWithIRFlags, public VPIRMetadata {
                          const VPIRFlags &Flags = {},
                          const VPIRMetadata &Metadata = {},
                          DebugLoc DL = DebugLoc::getUnknown())
-      : VPRecipeWithIRFlags(VPRecipeBase::VPWidenIntrinsicSC, CallArguments,
-                            Flags, DL),
+      : VPRecipeWithIRFlags(VPRecipeBase::VPWidenIntrinsicSC,
+                            VPRecipeBase::Wide, CallArguments, Flags, DL),
         VPIRMetadata(Metadata), VectorIntrinsicID(VectorIntrinsicID),
         ResultTy(Ty) {
     LLVMContext &Ctx = Ty->getContext();
@@ -1992,8 +2024,8 @@ class LLVM_ABI_FOR_TEST VPWidenCallRecipe : public VPRecipeWithIRFlags,
                     ArrayRef<VPValue *> CallArguments,
                     const VPIRFlags &Flags = {},
                     const VPIRMetadata &Metadata = {}, DebugLoc DL = {})
-      : VPRecipeWithIRFlags(VPRecipeBase::VPWidenCallSC, CallArguments, Flags,
-                            DL),
+      : VPRecipeWithIRFlags(VPRecipeBase::VPWidenCallSC, VPRecipeBase::Wide,
+                            CallArguments, Flags, DL),
         VPIRMetadata(Metadata), Variant(Variant) {
     setUnderlyingValue(UV);
     assert(
@@ -2044,7 +2076,8 @@ class VPHistogramRecipe : public VPRecipeBase {
 public:
   VPHistogramRecipe(unsigned Opcode, ArrayRef<VPValue *> Operands,
                     DebugLoc DL = DebugLoc::getUnknown())
-      : VPRecipeBase(VPRecipeBase::VPHistogramSC, Operands, DL),
+      : VPRecipeBase(VPRecipeBase::VPHistogramSC, VPRecipeBase::Wide, Operands,
+                     DL),
         Opcode(Opcode) {}
 
   ~VPHistogramRecipe() override = default;
@@ -2094,7 +2127,8 @@ class LLVM_ABI_FOR_TEST VPWidenGEPRecipe : public VPRecipeWithIRFlags {
   VPWidenGEPRecipe(GetElementPtrInst *GEP, ArrayRef<VPValue *> Operands,
                    const VPIRFlags &Flags = {},
                    DebugLoc DL = DebugLoc::getUnknown())
-      : VPRecipeWithIRFlags(VPRecipeBase::VPWidenGEPSC, Operands, Flags, DL),
+      : VPRecipeWithIRFlags(VPRecipeBase::VPWidenGEPSC, VPRecipeBase::Wide,
+                            Operands, Flags, DL),
         SourceElementTy(GEP->getSourceElementType()) {
     setUnderlyingValue(GEP);
     SmallVector<std::pair<unsigned, MDNode *>> Metadata;
@@ -2153,8 +2187,8 @@ class VPVectorEndPointerRecipe : public VPRecipeWithIRFlags {
 public:
   VPVectorEndPointerRecipe(VPValue *Ptr, VPValue *VF, Type *SourceElementTy,
                            int64_t Stride, GEPNoWrapFlags GEPFlags, DebugLoc DL)
-      : VPRecipeWithIRFlags(VPRecipeBase::VPVectorEndPointerSC, {Ptr, VF},
-                            GEPFlags, DL),
+      : VPRecipeWithIRFlags(VPRecipeBase::VPVectorEndPointerSC,
+                            VPRecipeBase::Narrow, {Ptr, VF}, GEPFlags, DL),
         SourceElementTy(SourceElementTy), Stride(Stride) {
     assert(Stride < 0 && "Stride must be negative");
   }
@@ -2222,7 +2256,8 @@ class VPVectorPointerRecipe : public VPRecipeWithIRFlags {
 public:
   VPVectorPointerRecipe(VPValue *Ptr, Type *SourceElementTy,
                         GEPNoWrapFlags GEPFlags, DebugLoc DL)
-      : VPRecipeWithIRFlags(VPRecipeBase::VPVectorPointerSC, Ptr, GEPFlags, DL),
+      : VPRecipeWithIRFlags(VPRecipeBase::VPVectorPointerSC,
+                            VPRecipeBase::Narrow, Ptr, GEPFlags, DL),
         SourceElementTy(SourceElementTy) {}
 
   VP_CLASSOF_IMPL(VPRecipeBase::VPVectorPointerSC)
@@ -2296,7 +2331,8 @@ class LLVM_ABI_FOR_TEST VPHeaderPHIRecipe : public VPSingleDefRecipe,
 protected:
   VPHeaderPHIRecipe(unsigned char VPRecipeID, Instruction *UnderlyingInstr,
                     VPValue *Start, DebugLoc DL = DebugLoc::getUnknown())
-      : VPSingleDefRecipe(VPRecipeID, Start, UnderlyingInstr, DL) {}
+      : VPSingleDefRecipe(VPRecipeID, VPRecipeBase::Wide, Start,
+                          UnderlyingInstr, DL) {}
 
   const VPRecipeBase *getAsRecipe() const override { return this; }
 
@@ -2585,7 +2621,8 @@ class LLVM_ABI_FOR_TEST VPWidenPHIRecipe : public VPSingleDefRecipe,
   /// debug location \p DL and \p Name.
   VPWidenPHIRecipe(ArrayRef<VPValue *> IncomingValues,
                    DebugLoc DL = DebugLoc::getUnknown(), const Twine &Name = "")
-      : VPSingleDefRecipe(VPRecipeBase::VPWidenPHISC, IncomingValues, DL),
+      : VPSingleDefRecipe(VPRecipeBase::VPWidenPHISC, VPRecipeBase::Wide,
+                          IncomingValues, DL),
         Name(Name.str()) {}
 
   VPWidenPHIRecipe *clone() override {
@@ -2780,7 +2817,8 @@ class LLVM_ABI_FOR_TEST VPBlendRecipe : public VPRecipeWithIRFlags {
   /// all other incoming values are merged into it.
   VPBlendRecipe(PHINode *Phi, ArrayRef<VPValue *> Operands,
                 const VPIRFlags &Flags, DebugLoc DL)
-      : VPRecipeWithIRFlags(VPRecipeBase::VPBlendSC, Operands, Flags, DL) {
+      : VPRecipeWithIRFlags(VPRecipeBase::VPBlendSC, VPRecipeBase::Wide,
+                            Operands, Flags, DL) {
     assert(Operands.size() >= 2 && "Expected at least two operands!");
     setUnderlyingValue(Phi);
   }
@@ -2861,8 +2899,8 @@ class LLVM_ABI_FOR_TEST VPInterleaveBase : public VPRecipeBase,
                    ArrayRef<VPValue *> Operands,
                    ArrayRef<VPValue *> StoredValues, VPValue *Mask,
                    bool NeedsMaskForGaps, const VPIRMetadata &MD, DebugLoc DL)
-      : VPRecipeBase(SC, Operands, DL), VPIRMetadata(MD), IG(IG),
-        NeedsMaskForGaps(NeedsMaskForGaps) {
+      : VPRecipeBase(SC, VPRecipeBase::Wide, Operands, DL), VPIRMetadata(MD),
+        IG(IG), NeedsMaskForGaps(NeedsMaskForGaps) {
     // TODO: extend the masked interleaved-group support to reversed access.
     assert((!Mask || !IG->isReverse()) &&
            "Reversed masked interleave-group not supported.");
@@ -3046,8 +3084,8 @@ class LLVM_ABI_FOR_TEST VPReductionRecipe : public VPRecipeWithIRFlags {
                     FastMathFlags FMFs, Instruction *I,
                     ArrayRef<VPValue *> Operands, VPValue *CondOp,
                     ReductionStyle Style, DebugLoc DL)
-      : VPRecipeWithIRFlags(SC, Operands, FMFs, DL), RdxKind(RdxKind),
-        Style(Style) {
+      : VPRecipeWithIRFlags(SC, VPRecipeBase::Wide, Operands, FMFs, DL),
+        RdxKind(RdxKind), Style(Style) {
     if (CondOp) {
       IsConditional = true;
       addOperand(CondOp);
@@ -3188,9 +3226,6 @@ class LLVM_ABI_FOR_TEST VPReductionEVLRecipe : public VPReductionRecipe {
 /// a single scalar, only one copy will be generated.
 class LLVM_ABI_FOR_TEST VPReplicateRecipe : public VPRecipeWithIRFlags,
                                             public VPIRMetadata {
-  /// Indicator if only a single replica per lane is needed.
-  bool IsSingleScalar;
-
   /// Indicator if the replicas are also predicated.
   bool IsPredicated;
 
@@ -3199,9 +3234,11 @@ class LLVM_ABI_FOR_TEST VPReplicateRecipe : public VPRecipeWithIRFlags,
                     bool IsSingleScalar, VPValue *Mask = nullptr,
                     const VPIRFlags &Flags = {}, VPIRMetadata Metadata = {},
                     DebugLoc DL = DebugLoc::getUnknown())
-      : VPRecipeWithIRFlags(VPRecipeBase::VPReplicateSC, Operands, Flags, DL),
-        VPIRMetadata(Metadata), IsSingleScalar(IsSingleScalar),
-        IsPredicated(Mask) {
+      : VPRecipeWithIRFlags(VPRecipeBase::VPReplicateSC,
+                            IsSingleScalar ? VPRecipeBase::Narrow
+                                           : VPRecipeBase::ReplicatePart,
+                            Operands, Flags, DL),
+        VPIRMetadata(Metadata), IsPredicated(Mask) {
     setUnderlyingValue(I);
     if (Mask)
       addOperand(Mask);
@@ -3211,7 +3248,7 @@ class LLVM_ABI_FOR_TEST VPReplicateRecipe : public VPRecipeWithIRFlags,
 
   VPReplicateRecipe *clone() override {
     auto *Copy = new VPReplicateRecipe(
-        getUnderlyingInstr(), operands(), IsSingleScalar,
+        getUnderlyingInstr(), operands(), isNarrow(),
         isPredicated() ? getMask() : nullptr, *this, *this, getDebugLoc());
     Copy->transferFlags(*this);
     return Copy;
@@ -3228,7 +3265,8 @@ class LLVM_ABI_FOR_TEST VPReplicateRecipe : public VPRecipeWithIRFlags,
   InstructionCost computeCost(ElementCount VF,
                               VPCostContext &Ctx) const override;
 
-  bool isSingleScalar() const { return IsSingleScalar; }
+  /// TODO: Remove.
+  bool isSingleScalar() const { return isNarrow(); }
 
   bool isPredicated() const { return IsPredicated; }
 
@@ -3271,7 +3309,8 @@ class LLVM_ABI_FOR_TEST VPReplicateRecipe : public VPRecipeWithIRFlags,
 class LLVM_ABI_FOR_TEST VPBranchOnMaskRecipe : public VPRecipeBase {
 public:
   VPBranchOnMaskRecipe(VPValue *BlockInMask, DebugLoc DL)
-      : VPRecipeBase(VPRecipeBase::VPBranchOnMaskSC, {BlockInMask}, DL) {}
+      : VPRecipeBase(VPRecipeBase::VPBranchOnMaskSC, VPRecipeBase::Narrow,
+                     {BlockInMask}, DL) {}
 
   VPBranchOnMaskRecipe *clone() override {
     return new VPBranchOnMaskRecipe(getOperand(0), getDebugLoc());
@@ -3459,8 +3498,10 @@ class LLVM_ABI_FOR_TEST VPPredInstPHIRecipe : public VPSingleDefRecipe {
 public:
   /// Construct a VPPredInstPHIRecipe given \p PredInst whose value needs a phi
   /// nodes after merging back from a Branch-on-Mask.
+  /// FIXME: The WideInfo should be narrow.
   VPPredInstPHIRecipe(VPValue *PredV, DebugLoc DL)
-      : VPSingleDefRecipe(VPRecipeBase::VPPredInstPHISC, PredV, DL) {}
+      : VPSingleDefRecipe(VPRecipeBase::VPPredInstPHISC, VPRecipeBase::Wide,
+                          PredV, DL) {}
   ~VPPredInstPHIRecipe() override = default;
 
   VPPredInstPHIRecipe *clone() override {
@@ -3523,7 +3564,8 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
                       std::initializer_list<VPValue *> Operands,
                       bool Consecutive, const VPIRMetadata &Metadata,
                       DebugLoc DL)
-      : VPRecipeBase(SC, Operands, DL), VPIRMetadata(Metadata), Ingredient(I),
+      : VPRecipeBase(SC, VPRecipeBase::Wide, Operands, DL),
+        VPIRMetadata(Metadata), Ingredient(I),
         Alignment(getLoadStoreAlignment(&I)), Consecutive(Consecutive) {}
 
 public:
@@ -3753,7 +3795,9 @@ class VPExpandSCEVRecipe : public VPSingleDefRecipe {
 
 public:
   VPExpandSCEVRecipe(const SCEV *Expr)
-      : VPSingleDefRecipe(VPRecipeBase::VPExpandSCEVSC, {}), Expr(Expr) {}
+      : VPSingleDefRecipe(VPRecipeBase::VPExpandSCEVSC, VPRecipeBase::Narrow,
+                          {}),
+        Expr(Expr) {}
 
   ~VPExpandSCEVRecipe() override = default;
 
@@ -3862,7 +3906,8 @@ class VPWidenCanonicalIVRecipe : public VPSingleDefRecipe,
                                  public VPUnrollPartAccessor<1> {
 public:
   V...
[truncated]

@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown

🐧 Linux x64 Test Results

  • 194646 tests passed
  • 5167 tests skipped

✅ The build succeeded and all tests passed.

@github-actions

github-actions Bot commented May 1, 2026

Copy link
Copy Markdown

🪟 Windows x64 Test Results

  • 134146 tests passed
  • 3222 tests skipped

✅ The build succeeded and all tests passed.

@artagnon artagnon changed the title [VPlan] Embed widening decisions in recipes (NFCI) [VPlan] Embed widening decisions in recipes May 2, 2026
@artagnon artagnon force-pushed the vplan-wideninginfo branch from 347e5cb to fb1bcb5 Compare May 2, 2026 11:00
@artagnon artagnon changed the title [VPlan] Embed widening decisions in recipes [VPlan] Embed widening decisions in recipes (NFCI) May 2, 2026
@artagnon artagnon force-pushed the vplan-wideninginfo branch from fb1bcb5 to f0d6d76 Compare May 2, 2026 13:17
@artagnon artagnon force-pushed the vplan-wideninginfo branch 5 times, most recently from fdb7742 to 58cacd9 Compare May 3, 2026 14:34

@fhahn fhahn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the transition towards folding various recipes into VPInstruction & co to reduce unnecessary duplication, we need a way to express the widening decision directly inside VPInstruction, which currently is the main recipe that does not directly imply a widening decisions. There are a few others that produce a single-scalar/scalar-for-all lanes like ScalarIVSteps, but VPInstruction is the main outlier, handling all three widening options somewhat implicitly.

It would be good to start focusing on VPInstruction and trying to subsume obvious duplicated recipes. We should be able to get there incrementally, by starting with tracking the widening decision in VPInstruction. Existing #129712 and #141429 could serve as starting point, by tracking single-scalar first.

This could then be easily extended to encode the full range with an enum as proposed here, coupled with moving cases from VPReplicateRecipe over to VPInstruction.

I think it would still be good to tie the widening decision immutably to a recipe instance (somewhat fixing the current outlier VPInstruction). Almost all current recipes already encode this, so I am not sure we should add a field to track the state that is already explicit for most recipes as first step.

At least initially, I think it would also be good to keep the distinction between the widening encoded in a recipe, and changes done via analysis + transform (e.g. if we can determine that a recipe producing a vector is only used as scalar, we should convert it to a single-scalar recipe)

@artagnon

artagnon commented May 3, 2026

Copy link
Copy Markdown
Contributor Author

There are several special-cases that I have unified here, and recipes other than VPInstruction already have ambiguous semantics: there can be a Widen that can never be narrowed like Widen(Call|Intrinsic) due to Function-type mismatch, a Widen that can be narrowed like Widen(GEP|Cast), a Narrow (what we call "single-scalar" in some places), a ReplicatePart that is illegal to narrow like a side-effecting one, a ReplicatePart that is legal to narrow like a BinOp or Cast, VectorToScalar that produce scalars from a vector, and ScalarToVector that produce a vector from a scalar. Of course, we could match the recipe and the opcode to find out which of these classes the instruction belongs to, but I thought it would be good to embed the information in VPRecipeBase to clean up some code and avoid recursive reasoning.

I agree that we should try to consolidate recipes into VPInstruction: I also proposed #193510. I'm not sure about an immutable widening decision though -- the issue is that recipes can go from a Widen/ReplicatePart decision into a Narrow decision at any point: this depends both on legality and profitability, which do change over transforms. Currently, we widen recipes upfront (using the legacy costing), creating hard-coded wide recipes, and we see these pretty early in the VPlan: isn't it wasteful to extract the data from one kind of recipe just to create another kind of recipe to replace it?

I found out that "single-scalar" is ill-defined, which is the reason I separated out concerns in the WideningTy bits. The reason I'm proposing this prior to the consolidation is so that we get the terminology right upfront, and avoid falling into the same pattern of ill-defined terms and hacks to work around it. For example, in the new vputils::isSingleScalar, it is clear that the isAgnostic case should be removed, and the helper should be renamed to vputils::couldProduceNarrowResultByNarrowing (which is what it really is). I've improved usesScalarsOnly in this patch, and usesFirstLaneOnly can also be improved in a follow-up to avoid recursive reasoning, to check against WideningInfo.

Yes, this has the downside that a marking a hard-coded wide recipe Narrow would not change anything, and the goal is not to make it work, but rather remove the hard-coded wide recipes like Widen(GEP|Cast) altogether, and unify the execute in the wide-narrow-unified recipe to respect WideningInfo.

I'm not a fan of creating dead code/paths either, but I would really like to see the enum land, so I can make forward progress? I tried out a lot of experiments (see the closed PRs linked in this PR) to discover these properties, and I'm confident that I've captured all the widening semantics now?

I did try to show enough of an upside in the current PR though: many casts and TypeSwitches would get cleaned up when simply reasoning about the widening semantics of a recipe -- in this PR, vputils::isSingleScalar got simplified, and as a follow up, narrowToSingleScalars can be simplified with optimization benefits?

A planned follow-up is to directly be able to query VPValue to check its widening information: roughly speaking, a live-in will not require a broadcast if all its users are narrow'able recipes. Another planned follow-up is to extend narrowToSingleScalars to narrow all narrow'able recipes -- an example of such a recipe that cannot be absorbed into VPInstruction is VPPredInstPHI. I could move the enum to VPInstruction, and do a TypeSwitch over recipes, querying their opcodes for this WideningInfo, but seems wasteful and would create unnecessary duplication and confusion? Even if the recipes' WideningInfo is fixed and immutable, I think it's valuable to have it embedded in the recipe, as this information is not present elsewhere, and it is up to contributors to inspect each recipe and figure out the widening semantics? The main simplifications would come in the recipes' execute, which would not use hacks like onlyFirstLaneUsed or isSingleScalar to re-analyze the VPlan, and determine the IsScalar boolean of State.get or State.set: they would simply use the WideningInfo?

I think the alternative of having virtual functions is unworkable for the moment, as VPInstruction isn't the only recipe that mutates the WideningInfo.

On the analysis/transform separation, how would we convert any recipe to a "single-scalar" to commit the result of the analysis? Do you propose to create a "single-scalar" versions of recipes like Blends, PredInstPHI, and Reductions? Would you duplicate their execute members, inserting true for State.(get|set) in one place, and false in another? If you ask me, the only difference the widening decision makes to a recipe is this boolean in the execute?

p.s. I see that your VPI-consolidation patches are quite old -- I think we can go ahead with the scalar cast patch (I'll try to review it), but I really think we should try to review this PR before the widen cast.

artagnon added 2 commits May 4, 2026 08:44
We currently make widening decisions in an ad-hoc fashion, and have
helpers that unnecessarily do recursive reasoning over and over:
isSingleScalar, onlyFirstLaneUsed, and onlyScalarValuesUsed. The goal is
to make widening decisions in a principled fashion, simply by analyzing
the Plan, and this patch is the first step towards the goal. By
embedding widening decisions in recipes, which can be refined
iteratively, we can eliminate the hard-coded wide-equivalents of
VPInstruction, including VPWidenRecipe, VPWidenGEPRecipe, and
VPWidenCastRecipe. The design we have picked would also allow us to
eliminate VPReplicateRecipe.

This patch is intentionally non-functional, but it should have enough
information to see what the follow-ups would be.
@artagnon artagnon force-pushed the vplan-wideninginfo branch from 36f36f8 to fb27825 Compare May 4, 2026 07:49
@artagnon

artagnon commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

I have made it immutable for now -- WideningInfo is now query'able information holding the widening semantics of different recipes, and this data is not present anywhere else at the moment. We can discuss the mutability aspect separately.

@artagnon artagnon changed the title [VPlan] Embed widening decisions in recipes (NFCI) [VPlan] Embed widening semantics of recipes within them (NFCI) May 4, 2026
@artagnon artagnon changed the title [VPlan] Embed widening semantics of recipes within them (NFCI) [VPlan] Embed widening semantics of recipes (NFCI) May 4, 2026
@artagnon

artagnon commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

I had some more time to think about this. I'll write in bullets to avoid writing the same kind of long prose as my previous post:

  1. Mutability versus immutability is orthogonal to keeping widening information in the recipe that would otherwise have to be inferred by reading the recipe's code.
  2. We do have to have to some un-narrowable recipes hard-coded. Otherwise, what would the leaf cases of any analysis be? We'd just end up narrowing everything and producing the scalar loop again! It would be valuable to keep track of which recipes are narrowable, and embed this information directly in the recipe (Narrow bit).
  3. Currently, the unroller can only handle certain recipes, and we would have to extend it to handle more one by one. It would hence be valuable to keep track of which recipes the unroller supports, and embed the information directly into the recipe (ReplicatePart bit).
  4. Currently, Wide and ReplicatePart are exclusive, but there is no reason we cannot eventually decide to either widen or unroll a recipe via analysis. Again, we have to have to keep track of this information (Wide ^ ReplicatePart), and it would be good to ensure that the invariant is maintained until we break it. Ultimately, we want to unify usesFirst(Lane|Part)Only.
  5. We currently have only one ScalarToVector instruction (Broadcast), but several VectorToScalars (from VPInstruction and VPReductionRecipe). It would be useful to keep track of this information directly in the recipe (Agnostic bit), so we know what the IsScalar of VPTransformState::get needs to be. At the moment, VPTransformState introduces broadcasts and extracts just-in-time, and these are not visible in the VPlan, and not amenable to optimization: one of the goals of this line of work is to get rid of it completely. It does not make sense to replicate Agnostic recipes (!(Agnostic && ReplicatePart)), and it would be good to maintain this invariant.
  6. We could put the enum in VPInstruction like you suggested, but I think we will at least be missing out information on VPPredInstPHI, VPReduction, VPBlend, and VPWiden(Call|Intrinsic|Load|Store), albeit unnecessarily.
  7. Finally, I think we should eventually move to mutable widening information, although I agree that we should keep it immutable in view of the current state of the codebase.
  8. I'm planning to add a VoidResult bit to faithfully capture the fact that stores and intrinsics like assumes do not produce results. This information would make couldProduce(Narrow|Wide)Result more faithful, would help us make replication decisions along with mayHaveSideEffects, and we would eliminate all the special cases. Currently, a recipe's defined values does not fully capture this, as Replicates are SingleDefs.

@artagnon

Copy link
Copy Markdown
Contributor Author

Better idea: #196181.

@artagnon artagnon closed this May 25, 2026
@artagnon artagnon deleted the vplan-wideninginfo branch May 25, 2026 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants