Skip to content

[LV] Use VPReductionRecipe for partial reductions#147513

Merged
SamTebbs33 merged 5 commits into
llvm:mainfrom
SamTebbs33:pred-as-reduction-bundle-recipe
Nov 26, 2025
Merged

[LV] Use VPReductionRecipe for partial reductions#147513
SamTebbs33 merged 5 commits into
llvm:mainfrom
SamTebbs33:pred-as-reduction-bundle-recipe

Conversation

@SamTebbs33

@SamTebbs33 SamTebbs33 commented Jul 8, 2025

Copy link
Copy Markdown
Contributor

Partial reductions can easily be represented by the VPReductionRecipe class by setting their scale factor to something greater than 1. This PR merges the two together and gives VPReductionRecipe a VFScaleFactor so that it can choose to generate the partial reduction intrinsic at execute time.

Stacked PRs:

  1. [LV] Create in-loop sub reductions #147026
  2. [LV] Bundle sub reductions into VPExpressionRecipe #147255
  3. [LV] Keep duplicate recipes in VPExpressionRecipe #156976
  4. [LV] Add ExtNegatedMulAccReduction expression type #160154
  5. [LV] Bundle partial reductions inside VPExpressionRecipe #147302
  6. [LV] Bundle (partial) reductions with a mul of a constant #162503
  7. -> [LV] Use VPReductionRecipe for partial reductions #147513

Replaces #146073 .

@llvmbot

llvmbot commented Jul 8, 2025

Copy link
Copy Markdown
Member

@llvm/pr-subscribers-vectorizers

Author: Sam Tebbs (SamTebbs33)

Changes

Partial reductions can easily be represented by the VPReductionRecipe class by setting their scale factor to something greater than 1. This PR merges the two together and gives VPReductionRecipe a VFScaleFactor so that it can choose to generate the partial reduction intrinsic at execute time.

Depends on #144281 .
Replaces #146073 .


Patch is 170.15 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/147513.diff

11 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+20-14)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+45-90)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+8-7)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+34-114)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+28-7)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-epilogue.ll (+83-163)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-neon.ll (+243-483)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/vplan-printing.ll (+1-1)
  • (modified) llvm/unittests/Transforms/Vectorize/VPlanTest.cpp (+4-4)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 0adff8d957e98..570fcf2f71bde 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7045,7 +7045,8 @@ static bool planContainsAdditionalSimplifications(VPlan &Plan,
       }
       // The VPlan-based cost model is more accurate for partial reduction and
       // comparing against the legacy cost isn't desirable.
-      if (isa<VPPartialReductionRecipe>(&R))
+      if (auto *VPR = dyn_cast<VPReductionRecipe>(&R);
+          VPR && VPR->isPartialReduction())
         return true;
 
       /// If a VPlan transform folded a recipe to one producing a single-scalar,
@@ -8307,11 +8308,14 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(VPSingleDefRecipe *R,
              Phi->getIncomingValueForBlock(OrigLoop->getLoopPreheader()));
 
       // If the PHI is used by a partial reduction, set the scale factor.
-      unsigned ScaleFactor =
-          getScalingForReduction(RdxDesc.getLoopExitInstr()).value_or(1);
-      PhiRecipe = new VPReductionPHIRecipe(
-          Phi, RdxDesc, *StartV, CM.isInLoopReduction(Phi),
-          CM.useOrderedReductions(RdxDesc), ScaleFactor);
+      bool UseInLoopReduction = CM.isInLoopReduction(Phi);
+      bool UseOrderedReductions = CM.useOrderedReductions(RdxDesc);
+      auto ScaleFactor =
+          (UseOrderedReductions || UseInLoopReduction)
+              ? 0
+              : getScalingForReduction(RdxDesc.getLoopExitInstr()).value_or(1);
+      PhiRecipe = new VPReductionPHIRecipe(Phi, RdxDesc, *StartV,
+                                           UseOrderedReductions, ScaleFactor);
     } else {
       // TODO: Currently fixed-order recurrences are modeled as chains of
       // first-order recurrences. If there are no users of the intermediate
@@ -8375,7 +8379,8 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction,
   VPValue *Accumulator = Operands[1];
   VPRecipeBase *BinOpRecipe = BinOp->getDefiningRecipe();
   if (isa<VPReductionPHIRecipe>(BinOpRecipe) ||
-      isa<VPPartialReductionRecipe>(BinOpRecipe))
+      (isa<VPReductionRecipe>(BinOpRecipe) &&
+       cast<VPReductionRecipe>(BinOpRecipe)->isPartialReduction()))
     std::swap(BinOp, Accumulator);
 
   unsigned ReductionOpcode = Reduction->getOpcode();
@@ -8396,12 +8401,10 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction,
            "Expected an ADD or SUB operation for predicated partial "
            "reductions (because the neutral element in the mask is zero)!");
     Cond = getBlockInMask(Builder.getInsertBlock());
-    VPValue *Zero =
-        Plan.getOrAddLiveIn(ConstantInt::get(Reduction->getType(), 0));
-    BinOp = Builder.createSelect(Cond, BinOp, Zero, Reduction->getDebugLoc());
   }
-  return new VPPartialReductionRecipe(ReductionOpcode, Accumulator, BinOp, Cond,
-                                      ScaleFactor, Reduction);
+
+  return new VPReductionRecipe(RecurKind::Add, FastMathFlags(), Reduction,
+                               Accumulator, BinOp, Cond, false, ScaleFactor);
 }
 
 void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
@@ -9189,9 +9192,11 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
       FastMathFlags FMFs = isa<FPMathOperator>(CurrentLinkI)
                                ? RdxDesc.getFastMathFlags()
                                : FastMathFlags();
+      bool UseOrderedReductions = CM.useOrderedReductions(RdxDesc);
+      unsigned VFScaleFactor = !UseOrderedReductions;
       auto *RedRecipe = new VPReductionRecipe(
           Kind, FMFs, CurrentLinkI, PreviousLink, VecOp, CondOp,
-          CM.useOrderedReductions(RdxDesc), CurrentLinkI->getDebugLoc());
+          UseOrderedReductions, VFScaleFactor, CurrentLinkI->getDebugLoc());
       // Append the recipe to the end of the VPBasicBlock because we need to
       // ensure that it comes after all of it's inputs, including CondOp.
       // Delete CurrentLink as it will be invalid if its operand is replaced
@@ -9225,8 +9230,9 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
     // Don't output selects for partial reductions because they have an output
     // with fewer lanes than the VF. So the operands of the select would have
     // different numbers of lanes. Partial reductions mask the input instead.
+    auto *RR = dyn_cast<VPReductionRecipe>(OrigExitingVPV->getDefiningRecipe());
     if (!PhiR->isInLoop() && CM.foldTailByMasking() &&
-        !isa<VPPartialReductionRecipe>(OrigExitingVPV->getDefiningRecipe())) {
+        (!RR || !RR->isPartialReduction())) {
       VPValue *Cond = RecipeBuilder.getBlockInMask(PhiR->getParent());
       std::optional<FastMathFlags> FMFs =
           PhiTy->isFloatingPointTy()
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 30f3566332d79..a3cc5f335e82e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -552,7 +552,6 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
     case VPRecipeBase::VPWidenIntOrFpInductionSC:
     case VPRecipeBase::VPWidenPointerInductionSC:
     case VPRecipeBase::VPReductionPHISC:
-    case VPRecipeBase::VPPartialReductionSC:
       return true;
     case VPRecipeBase::VPBranchOnMaskSC:
     case VPRecipeBase::VPInterleaveSC:
@@ -2194,26 +2193,29 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   /// Descriptor for the reduction.
   const RecurrenceDescriptor &RdxDesc;
 
-  /// The phi is part of an in-loop reduction.
-  bool IsInLoop;
-
   /// The phi is part of an ordered reduction. Requires IsInLoop to be true.
   bool IsOrdered;
 
-  /// When expanding the reduction PHI, the plan's VF element count is divided
-  /// by this factor to form the reduction phi's VF.
-  unsigned VFScaleFactor = 1;
+  /// The scaling factor, relative to the VF, that this recipe's output is
+  /// divided by.
+  /// For outer-loop reductions this is equal to 1.
+  /// For in-loop reductions this is equal to 0, to specify that this is equal
+  /// to the VF (which may not be known yet). For partial-reductions this is
+  /// equal to another scalar value.
+  unsigned VFScaleFactor;
 
 public:
   /// Create a new VPReductionPHIRecipe for the reduction \p Phi described by \p
   /// RdxDesc.
   VPReductionPHIRecipe(PHINode *Phi, const RecurrenceDescriptor &RdxDesc,
-                       VPValue &Start, bool IsInLoop = false,
-                       bool IsOrdered = false, unsigned VFScaleFactor = 1)
+                       VPValue &Start, bool IsOrdered = false,
+                       unsigned VFScaleFactor = 1)
       : VPHeaderPHIRecipe(VPDef::VPReductionPHISC, Phi, &Start),
-        RdxDesc(RdxDesc), IsInLoop(IsInLoop), IsOrdered(IsOrdered),
-        VFScaleFactor(VFScaleFactor) {
-    assert((!IsOrdered || IsInLoop) && "IsOrdered requires IsInLoop");
+        RdxDesc(RdxDesc), IsOrdered(IsOrdered), VFScaleFactor(VFScaleFactor) {
+    assert((!IsOrdered || isInLoop()) &&
+           "IsOrdered requires the reduction to be in-loop");
+    assert(((!isInLoop() && !IsOrdered) || isInLoop()) &&
+           "Invalid VFScaleFactor");
   }
 
   ~VPReductionPHIRecipe() override = default;
@@ -2221,7 +2223,7 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   VPReductionPHIRecipe *clone() override {
     auto *R = new VPReductionPHIRecipe(
         dyn_cast_or_null<PHINode>(getUnderlyingValue()), RdxDesc,
-        *getOperand(0), IsInLoop, IsOrdered, VFScaleFactor);
+        *getOperand(0), IsOrdered, VFScaleFactor);
     R->addOperand(getBackedgeValue());
     return R;
   }
@@ -2247,8 +2249,11 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   /// Returns true, if the phi is part of an ordered reduction.
   bool isOrdered() const { return IsOrdered; }
 
-  /// Returns true, if the phi is part of an in-loop reduction.
-  bool isInLoop() const { return IsInLoop; }
+  /// Returns true if the phi is part of an in-loop reduction.
+  bool isInLoop() const { return VFScaleFactor == 0; }
+
+  /// Returns true if the reduction outputs a vector with a scaled down VF.
+  bool isPartialReduction() const { return VFScaleFactor > 1; }
 
   /// Returns true if the recipe only uses the first lane of operand \p Op.
   bool onlyFirstLaneUsed(const VPValue *Op) const override {
@@ -2421,23 +2426,32 @@ class VPInterleaveRecipe : public VPRecipeBase {
   Instruction *getInsertPos() const { return IG->getInsertPos(); }
 };
 
-/// A recipe to represent inloop reduction operations, performing a reduction on
-/// a vector operand into a scalar value, and adding the result to a chain.
-/// The Operands are {ChainOp, VecOp, [Condition]}.
+/// A recipe to represent inloop, ordered or partial reduction operations. It
+/// performs a reduction on a vector operand into a scalar (vector in the case
+/// of a partial reduction) value, and adds the result to a chain. The Operands
+/// are {ChainOp, VecOp, [Condition]}.
 class VPReductionRecipe : public VPRecipeWithIRFlags {
   /// The recurrence kind for the reduction in question.
   RecurKind RdxKind;
   bool IsOrdered;
   /// Whether the reduction is conditional.
   bool IsConditional = false;
+  /// The scaling factor, relative to the VF, that this recipe's output is
+  /// divided by.
+  /// For outer-loop reductions this is equal to 1.
+  /// For in-loop reductions this is equal to 0, to specify that this is equal
+  /// to the VF (which may not be known yet).
+  /// For partial-reductions this is equal to another scalar value.
+  unsigned VFScaleFactor;
 
 protected:
   VPReductionRecipe(const unsigned char SC, RecurKind RdxKind,
                     FastMathFlags FMFs, Instruction *I,
                     ArrayRef<VPValue *> Operands, VPValue *CondOp,
-                    bool IsOrdered, DebugLoc DL)
+                    bool IsOrdered, unsigned VFScaleFactor, DebugLoc DL)
       : VPRecipeWithIRFlags(SC, Operands, FMFs, DL), RdxKind(RdxKind),
-        IsOrdered(IsOrdered) {
+        IsOrdered(IsOrdered), VFScaleFactor(VFScaleFactor) {
+    assert((!IsOrdered || VFScaleFactor == 0) && "Invalid scale factor");
     if (CondOp) {
       IsConditional = true;
       addOperand(CondOp);
@@ -2448,30 +2462,29 @@ class VPReductionRecipe : public VPRecipeWithIRFlags {
 public:
   VPReductionRecipe(RecurKind RdxKind, FastMathFlags FMFs, Instruction *I,
                     VPValue *ChainOp, VPValue *VecOp, VPValue *CondOp,
-                    bool IsOrdered, DebugLoc DL = {})
+                    bool IsOrdered, unsigned VFScaleFactor, DebugLoc DL = {})
       : VPReductionRecipe(VPDef::VPReductionSC, RdxKind, FMFs, I,
                           ArrayRef<VPValue *>({ChainOp, VecOp}), CondOp,
-                          IsOrdered, DL) {}
+                          IsOrdered, VFScaleFactor, DL) {}
 
   VPReductionRecipe(const RecurKind RdxKind, FastMathFlags FMFs,
                     VPValue *ChainOp, VPValue *VecOp, VPValue *CondOp,
-                    bool IsOrdered, DebugLoc DL = {})
+                    bool IsOrdered, unsigned VFScaleFactor, DebugLoc DL = {})
       : VPReductionRecipe(VPDef::VPReductionSC, RdxKind, FMFs, nullptr,
                           ArrayRef<VPValue *>({ChainOp, VecOp}), CondOp,
-                          IsOrdered, DL) {}
+                          IsOrdered, VFScaleFactor, DL) {}
 
   ~VPReductionRecipe() override = default;
 
   VPReductionRecipe *clone() override {
-    return new VPReductionRecipe(RdxKind, getFastMathFlags(),
-                                 getUnderlyingInstr(), getChainOp(), getVecOp(),
-                                 getCondOp(), IsOrdered, getDebugLoc());
+    return new VPReductionRecipe(
+        RdxKind, getFastMathFlags(), getUnderlyingInstr(), getChainOp(),
+        getVecOp(), getCondOp(), IsOrdered, VFScaleFactor, getDebugLoc());
   }
 
   static inline bool classof(const VPRecipeBase *R) {
     return R->getVPDefID() == VPRecipeBase::VPReductionSC ||
-           R->getVPDefID() == VPRecipeBase::VPReductionEVLSC ||
-           R->getVPDefID() == VPRecipeBase::VPPartialReductionSC;
+           R->getVPDefID() == VPRecipeBase::VPReductionEVLSC;
   }
 
   static inline bool classof(const VPUser *U) {
@@ -2498,6 +2511,8 @@ class VPReductionRecipe : public VPRecipeWithIRFlags {
   bool isOrdered() const { return IsOrdered; };
   /// Return true if the in-loop reduction is conditional.
   bool isConditional() const { return IsConditional; };
+  /// Returns true if the reduction outputs a vector with a scaled down VF.
+  bool isPartialReduction() const { return VFScaleFactor > 1; }
   /// The VPValue of the scalar Chain being accumulated.
   VPValue *getChainOp() const { return getOperand(0); }
   /// The VPValue of the vector value to be reduced.
@@ -2506,68 +2521,8 @@ class VPReductionRecipe : public VPRecipeWithIRFlags {
   VPValue *getCondOp() const {
     return isConditional() ? getOperand(getNumOperands() - 1) : nullptr;
   }
-};
-
-/// A recipe for forming partial reductions. In the loop, an accumulator and
-/// vector operand are added together and passed to the next iteration as the
-/// next accumulator. After the loop body, the accumulator is reduced to a
-/// scalar value.
-class VPPartialReductionRecipe : public VPReductionRecipe {
-  unsigned Opcode;
-
-  /// The divisor by which the VF of this recipe's output should be divided
-  /// during execution.
-  unsigned VFScaleFactor;
-
-public:
-  VPPartialReductionRecipe(Instruction *ReductionInst, VPValue *Op0,
-                           VPValue *Op1, VPValue *Cond, unsigned VFScaleFactor)
-      : VPPartialReductionRecipe(ReductionInst->getOpcode(), Op0, Op1, Cond,
-                                 VFScaleFactor, ReductionInst) {}
-  VPPartialReductionRecipe(unsigned Opcode, VPValue *Op0, VPValue *Op1,
-                           VPValue *Cond, unsigned ScaleFactor,
-                           Instruction *ReductionInst = nullptr)
-      : VPReductionRecipe(VPDef::VPPartialReductionSC, RecurKind::Add,
-                          FastMathFlags(), ReductionInst,
-                          ArrayRef<VPValue *>({Op0, Op1}), Cond, false, {}),
-        Opcode(Opcode), VFScaleFactor(ScaleFactor) {
-    [[maybe_unused]] auto *AccumulatorRecipe =
-        getChainOp()->getDefiningRecipe();
-    // When cloning as part of a VPExpressionRecipe, the chain op could have
-    // been removed from the plan and so doesn't have a defining recipe.
-    assert((!AccumulatorRecipe ||
-            isa<VPReductionPHIRecipe>(AccumulatorRecipe) ||
-            isa<VPPartialReductionRecipe>(AccumulatorRecipe)) &&
-           "Unexpected operand order for partial reduction recipe");
-  }
-  ~VPPartialReductionRecipe() override = default;
-
-  VPPartialReductionRecipe *clone() override {
-    return new VPPartialReductionRecipe(Opcode, getOperand(0), getOperand(1),
-                                        getCondOp(), VFScaleFactor,
-                                        getUnderlyingInstr());
-  }
-
-  VP_CLASSOF_IMPL(VPDef::VPPartialReductionSC)
-
-  /// Generate the reduction in the loop.
-  void execute(VPTransformState &State) override;
-
-  /// Return the cost of this VPPartialReductionRecipe.
-  InstructionCost computeCost(ElementCount VF,
-                              VPCostContext &Ctx) const override;
-
-  /// Get the binary op's opcode.
-  unsigned getOpcode() const { return Opcode; }
-
   /// Get the factor that the VF of this recipe's output should be scaled by.
   unsigned getVFScaleFactor() const { return VFScaleFactor; }
-
-#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
-  /// Print the recipe.
-  void print(raw_ostream &O, const Twine &Indent,
-             VPSlotTracker &SlotTracker) const override;
-#endif
 };
 
 /// A recipe to represent inloop reduction operations with vector-predication
@@ -2583,7 +2538,7 @@ class VPReductionEVLRecipe : public VPReductionRecipe {
             R.getFastMathFlags(),
             cast_or_null<Instruction>(R.getUnderlyingValue()),
             ArrayRef<VPValue *>({R.getChainOp(), R.getVecOp(), &EVL}), CondOp,
-            R.isOrdered(), DL) {}
+            R.isOrdered(), 0, DL) {}
 
   ~VPReductionEVLRecipe() override = default;
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 92db9674ef42b..6aa3750897309 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -282,10 +282,10 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
               [](const auto *R) { return R->getScalarType(); })
           .Case<VPReductionRecipe, VPPredInstPHIRecipe, VPWidenPHIRecipe,
                 VPScalarIVStepsRecipe, VPWidenGEPRecipe, VPVectorPointerRecipe,
-                VPVectorEndPointerRecipe, VPWidenCanonicalIVRecipe,
-                VPPartialReductionRecipe>([this](const VPRecipeBase *R) {
-            return inferScalarType(R->getOperand(0));
-          })
+                VPVectorEndPointerRecipe, VPWidenCanonicalIVRecipe>(
+              [this](const VPRecipeBase *R) {
+                return inferScalarType(R->getOperand(0));
+              })
           // VPInstructionWithType must be handled before VPInstruction.
           .Case<VPInstructionWithType, VPWidenIntrinsicRecipe,
                 VPWidenCastRecipe>(
@@ -396,7 +396,7 @@ bool VPDominatorTree::properlyDominates(const VPRecipeBase *A,
 static unsigned getVFScaleFactor(VPRecipeBase *R) {
   if (auto *RR = dyn_cast<VPReductionPHIRecipe>(R))
     return RR->getVFScaleFactor();
-  if (auto *RR = dyn_cast<VPPartialReductionRecipe>(R))
+  if (auto *RR = dyn_cast<VPReductionRecipe>(R))
     return RR->getVFScaleFactor();
   assert(
       (!isa<VPInstruction>(R) || cast<VPInstruction>(R)->getOpcode() !=
@@ -566,8 +566,9 @@ SmallVector<VPRegisterUsage, 8> llvm::calculateRegisterUsageForPlan(
         } else {
           // The output from scaled phis and scaled reductions actually has
           // fewer lanes than the VF.
-          unsigned ScaleFactor = getVFScaleFactor(R);
-          ElementCount VF = VFs[J].divideCoefficientBy(ScaleFactor);
+          ElementCount VF = VFs[J];
+          if (unsigned ScaleFactor = getVFScaleFactor(R))
+            VF = VF.divideCoefficientBy(ScaleFactor);
           LLVM_DEBUG(if (VF != VFs[J]) {
             dbgs() << "LV(REG): Scaled down VF from " << VFs[J] << " to " << VF
                    << " for " << *R << "\n";
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index b485fa3d86e40..9dc3fb1b97447 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -164,7 +164,6 @@ bool VPRecipeBase::mayHaveSideEffects() const {
     return cast<VPWidenIntrinsicRecipe>(this)->mayHaveSideEffects();
   case VPBlendSC:
   case VPReductionEVLSC:
-  case VPPartialReductionSC:
   case VPReductionSC:
   case VPScalarIVStepsSC:
   case VPVectorPointerSC:
@@ -294,104 +293,6 @@ bool VPRecipeBase::isScalarCast() const {
   return VPI && Instruction::isCast(VPI->getOpcode());
 }
 
-InstructionCost
-VPPartialReductionRecipe::computeCost(ElementCount VF,
-                                      VPCostContext &Ctx) const {
-  std::optional<unsigned> Opcode;
-  VPValue *Op = getOperand(0);
-  VPRecipeBase *OpR = Op->getDefiningRecipe();
-
-  // If the partial reduction is predicated, a select will be operand 0
-  using namespace llvm::VPlanPatternMatch;
-  if (match(getOperand(1), m_Select(m_VPValue(), m_VPValue(Op), m_VPValue()))) {
-    OpR = Op->getDefiningRecipe();
-  }
-
-  Type *InputTypeA = nullptr, *InputTypeB = nullptr;
-  TTI::PartialReductionExtendKind ExtAType = TTI::PR_None,
-                                  ExtBType = TTI::PR_None;
-
-  auto GetExtendKind = [](VPRecipeBase *R) {
-    if (!R)
-      return TTI::PR_None;
-    auto *WidenCastR = dyn_cast<VPWidenCastRecipe>(R);
-    if (!WidenCastR)
-      return TTI::PR_None;
-    if (WidenCastR->getOpcode() == Instruction::CastOps::ZExt)
-      return TTI::PR_ZeroExtend;
-    if (WidenCastR->getOpcode() == Instruction::CastOps::SExt)
-      return TTI::PR_SignExtend;
-    return TTI::PR_None;
-  };
-
-  // Pick out opcode, type/ext information and use sub side effects from a widen
-  // recipe....
[truncated]

@llvmbot

llvmbot commented Jul 8, 2025

Copy link
Copy Markdown
Member

@llvm/pr-subscribers-llvm-transforms

Author: Sam Tebbs (SamTebbs33)

Changes

Partial reductions can easily be represented by the VPReductionRecipe class by setting their scale factor to something greater than 1. This PR merges the two together and gives VPReductionRecipe a VFScaleFactor so that it can choose to generate the partial reduction intrinsic at execute time.

Depends on #144281 .
Replaces #146073 .


Patch is 170.15 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/147513.diff

11 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+20-14)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+45-90)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+8-7)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+34-114)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+28-7)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-epilogue.ll (+83-163)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-neon.ll (+243-483)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/vplan-printing.ll (+1-1)
  • (modified) llvm/unittests/Transforms/Vectorize/VPlanTest.cpp (+4-4)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 0adff8d957e98..570fcf2f71bde 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7045,7 +7045,8 @@ static bool planContainsAdditionalSimplifications(VPlan &Plan,
       }
       // The VPlan-based cost model is more accurate for partial reduction and
       // comparing against the legacy cost isn't desirable.
-      if (isa<VPPartialReductionRecipe>(&R))
+      if (auto *VPR = dyn_cast<VPReductionRecipe>(&R);
+          VPR && VPR->isPartialReduction())
         return true;
 
       /// If a VPlan transform folded a recipe to one producing a single-scalar,
@@ -8307,11 +8308,14 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(VPSingleDefRecipe *R,
              Phi->getIncomingValueForBlock(OrigLoop->getLoopPreheader()));
 
       // If the PHI is used by a partial reduction, set the scale factor.
-      unsigned ScaleFactor =
-          getScalingForReduction(RdxDesc.getLoopExitInstr()).value_or(1);
-      PhiRecipe = new VPReductionPHIRecipe(
-          Phi, RdxDesc, *StartV, CM.isInLoopReduction(Phi),
-          CM.useOrderedReductions(RdxDesc), ScaleFactor);
+      bool UseInLoopReduction = CM.isInLoopReduction(Phi);
+      bool UseOrderedReductions = CM.useOrderedReductions(RdxDesc);
+      auto ScaleFactor =
+          (UseOrderedReductions || UseInLoopReduction)
+              ? 0
+              : getScalingForReduction(RdxDesc.getLoopExitInstr()).value_or(1);
+      PhiRecipe = new VPReductionPHIRecipe(Phi, RdxDesc, *StartV,
+                                           UseOrderedReductions, ScaleFactor);
     } else {
       // TODO: Currently fixed-order recurrences are modeled as chains of
       // first-order recurrences. If there are no users of the intermediate
@@ -8375,7 +8379,8 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction,
   VPValue *Accumulator = Operands[1];
   VPRecipeBase *BinOpRecipe = BinOp->getDefiningRecipe();
   if (isa<VPReductionPHIRecipe>(BinOpRecipe) ||
-      isa<VPPartialReductionRecipe>(BinOpRecipe))
+      (isa<VPReductionRecipe>(BinOpRecipe) &&
+       cast<VPReductionRecipe>(BinOpRecipe)->isPartialReduction()))
     std::swap(BinOp, Accumulator);
 
   unsigned ReductionOpcode = Reduction->getOpcode();
@@ -8396,12 +8401,10 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction,
            "Expected an ADD or SUB operation for predicated partial "
            "reductions (because the neutral element in the mask is zero)!");
     Cond = getBlockInMask(Builder.getInsertBlock());
-    VPValue *Zero =
-        Plan.getOrAddLiveIn(ConstantInt::get(Reduction->getType(), 0));
-    BinOp = Builder.createSelect(Cond, BinOp, Zero, Reduction->getDebugLoc());
   }
-  return new VPPartialReductionRecipe(ReductionOpcode, Accumulator, BinOp, Cond,
-                                      ScaleFactor, Reduction);
+
+  return new VPReductionRecipe(RecurKind::Add, FastMathFlags(), Reduction,
+                               Accumulator, BinOp, Cond, false, ScaleFactor);
 }
 
 void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
@@ -9189,9 +9192,11 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
       FastMathFlags FMFs = isa<FPMathOperator>(CurrentLinkI)
                                ? RdxDesc.getFastMathFlags()
                                : FastMathFlags();
+      bool UseOrderedReductions = CM.useOrderedReductions(RdxDesc);
+      unsigned VFScaleFactor = !UseOrderedReductions;
       auto *RedRecipe = new VPReductionRecipe(
           Kind, FMFs, CurrentLinkI, PreviousLink, VecOp, CondOp,
-          CM.useOrderedReductions(RdxDesc), CurrentLinkI->getDebugLoc());
+          UseOrderedReductions, VFScaleFactor, CurrentLinkI->getDebugLoc());
       // Append the recipe to the end of the VPBasicBlock because we need to
       // ensure that it comes after all of it's inputs, including CondOp.
       // Delete CurrentLink as it will be invalid if its operand is replaced
@@ -9225,8 +9230,9 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
     // Don't output selects for partial reductions because they have an output
     // with fewer lanes than the VF. So the operands of the select would have
     // different numbers of lanes. Partial reductions mask the input instead.
+    auto *RR = dyn_cast<VPReductionRecipe>(OrigExitingVPV->getDefiningRecipe());
     if (!PhiR->isInLoop() && CM.foldTailByMasking() &&
-        !isa<VPPartialReductionRecipe>(OrigExitingVPV->getDefiningRecipe())) {
+        (!RR || !RR->isPartialReduction())) {
       VPValue *Cond = RecipeBuilder.getBlockInMask(PhiR->getParent());
       std::optional<FastMathFlags> FMFs =
           PhiTy->isFloatingPointTy()
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 30f3566332d79..a3cc5f335e82e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -552,7 +552,6 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
     case VPRecipeBase::VPWidenIntOrFpInductionSC:
     case VPRecipeBase::VPWidenPointerInductionSC:
     case VPRecipeBase::VPReductionPHISC:
-    case VPRecipeBase::VPPartialReductionSC:
       return true;
     case VPRecipeBase::VPBranchOnMaskSC:
     case VPRecipeBase::VPInterleaveSC:
@@ -2194,26 +2193,29 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   /// Descriptor for the reduction.
   const RecurrenceDescriptor &RdxDesc;
 
-  /// The phi is part of an in-loop reduction.
-  bool IsInLoop;
-
   /// The phi is part of an ordered reduction. Requires IsInLoop to be true.
   bool IsOrdered;
 
-  /// When expanding the reduction PHI, the plan's VF element count is divided
-  /// by this factor to form the reduction phi's VF.
-  unsigned VFScaleFactor = 1;
+  /// The scaling factor, relative to the VF, that this recipe's output is
+  /// divided by.
+  /// For outer-loop reductions this is equal to 1.
+  /// For in-loop reductions this is equal to 0, to specify that this is equal
+  /// to the VF (which may not be known yet). For partial-reductions this is
+  /// equal to another scalar value.
+  unsigned VFScaleFactor;
 
 public:
   /// Create a new VPReductionPHIRecipe for the reduction \p Phi described by \p
   /// RdxDesc.
   VPReductionPHIRecipe(PHINode *Phi, const RecurrenceDescriptor &RdxDesc,
-                       VPValue &Start, bool IsInLoop = false,
-                       bool IsOrdered = false, unsigned VFScaleFactor = 1)
+                       VPValue &Start, bool IsOrdered = false,
+                       unsigned VFScaleFactor = 1)
       : VPHeaderPHIRecipe(VPDef::VPReductionPHISC, Phi, &Start),
-        RdxDesc(RdxDesc), IsInLoop(IsInLoop), IsOrdered(IsOrdered),
-        VFScaleFactor(VFScaleFactor) {
-    assert((!IsOrdered || IsInLoop) && "IsOrdered requires IsInLoop");
+        RdxDesc(RdxDesc), IsOrdered(IsOrdered), VFScaleFactor(VFScaleFactor) {
+    assert((!IsOrdered || isInLoop()) &&
+           "IsOrdered requires the reduction to be in-loop");
+    assert(((!isInLoop() && !IsOrdered) || isInLoop()) &&
+           "Invalid VFScaleFactor");
   }
 
   ~VPReductionPHIRecipe() override = default;
@@ -2221,7 +2223,7 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   VPReductionPHIRecipe *clone() override {
     auto *R = new VPReductionPHIRecipe(
         dyn_cast_or_null<PHINode>(getUnderlyingValue()), RdxDesc,
-        *getOperand(0), IsInLoop, IsOrdered, VFScaleFactor);
+        *getOperand(0), IsOrdered, VFScaleFactor);
     R->addOperand(getBackedgeValue());
     return R;
   }
@@ -2247,8 +2249,11 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   /// Returns true, if the phi is part of an ordered reduction.
   bool isOrdered() const { return IsOrdered; }
 
-  /// Returns true, if the phi is part of an in-loop reduction.
-  bool isInLoop() const { return IsInLoop; }
+  /// Returns true if the phi is part of an in-loop reduction.
+  bool isInLoop() const { return VFScaleFactor == 0; }
+
+  /// Returns true if the reduction outputs a vector with a scaled down VF.
+  bool isPartialReduction() const { return VFScaleFactor > 1; }
 
   /// Returns true if the recipe only uses the first lane of operand \p Op.
   bool onlyFirstLaneUsed(const VPValue *Op) const override {
@@ -2421,23 +2426,32 @@ class VPInterleaveRecipe : public VPRecipeBase {
   Instruction *getInsertPos() const { return IG->getInsertPos(); }
 };
 
-/// A recipe to represent inloop reduction operations, performing a reduction on
-/// a vector operand into a scalar value, and adding the result to a chain.
-/// The Operands are {ChainOp, VecOp, [Condition]}.
+/// A recipe to represent inloop, ordered or partial reduction operations. It
+/// performs a reduction on a vector operand into a scalar (vector in the case
+/// of a partial reduction) value, and adds the result to a chain. The Operands
+/// are {ChainOp, VecOp, [Condition]}.
 class VPReductionRecipe : public VPRecipeWithIRFlags {
   /// The recurrence kind for the reduction in question.
   RecurKind RdxKind;
   bool IsOrdered;
   /// Whether the reduction is conditional.
   bool IsConditional = false;
+  /// The scaling factor, relative to the VF, that this recipe's output is
+  /// divided by.
+  /// For outer-loop reductions this is equal to 1.
+  /// For in-loop reductions this is equal to 0, to specify that this is equal
+  /// to the VF (which may not be known yet).
+  /// For partial-reductions this is equal to another scalar value.
+  unsigned VFScaleFactor;
 
 protected:
   VPReductionRecipe(const unsigned char SC, RecurKind RdxKind,
                     FastMathFlags FMFs, Instruction *I,
                     ArrayRef<VPValue *> Operands, VPValue *CondOp,
-                    bool IsOrdered, DebugLoc DL)
+                    bool IsOrdered, unsigned VFScaleFactor, DebugLoc DL)
       : VPRecipeWithIRFlags(SC, Operands, FMFs, DL), RdxKind(RdxKind),
-        IsOrdered(IsOrdered) {
+        IsOrdered(IsOrdered), VFScaleFactor(VFScaleFactor) {
+    assert((!IsOrdered || VFScaleFactor == 0) && "Invalid scale factor");
     if (CondOp) {
       IsConditional = true;
       addOperand(CondOp);
@@ -2448,30 +2462,29 @@ class VPReductionRecipe : public VPRecipeWithIRFlags {
 public:
   VPReductionRecipe(RecurKind RdxKind, FastMathFlags FMFs, Instruction *I,
                     VPValue *ChainOp, VPValue *VecOp, VPValue *CondOp,
-                    bool IsOrdered, DebugLoc DL = {})
+                    bool IsOrdered, unsigned VFScaleFactor, DebugLoc DL = {})
       : VPReductionRecipe(VPDef::VPReductionSC, RdxKind, FMFs, I,
                           ArrayRef<VPValue *>({ChainOp, VecOp}), CondOp,
-                          IsOrdered, DL) {}
+                          IsOrdered, VFScaleFactor, DL) {}
 
   VPReductionRecipe(const RecurKind RdxKind, FastMathFlags FMFs,
                     VPValue *ChainOp, VPValue *VecOp, VPValue *CondOp,
-                    bool IsOrdered, DebugLoc DL = {})
+                    bool IsOrdered, unsigned VFScaleFactor, DebugLoc DL = {})
       : VPReductionRecipe(VPDef::VPReductionSC, RdxKind, FMFs, nullptr,
                           ArrayRef<VPValue *>({ChainOp, VecOp}), CondOp,
-                          IsOrdered, DL) {}
+                          IsOrdered, VFScaleFactor, DL) {}
 
   ~VPReductionRecipe() override = default;
 
   VPReductionRecipe *clone() override {
-    return new VPReductionRecipe(RdxKind, getFastMathFlags(),
-                                 getUnderlyingInstr(), getChainOp(), getVecOp(),
-                                 getCondOp(), IsOrdered, getDebugLoc());
+    return new VPReductionRecipe(
+        RdxKind, getFastMathFlags(), getUnderlyingInstr(), getChainOp(),
+        getVecOp(), getCondOp(), IsOrdered, VFScaleFactor, getDebugLoc());
   }
 
   static inline bool classof(const VPRecipeBase *R) {
     return R->getVPDefID() == VPRecipeBase::VPReductionSC ||
-           R->getVPDefID() == VPRecipeBase::VPReductionEVLSC ||
-           R->getVPDefID() == VPRecipeBase::VPPartialReductionSC;
+           R->getVPDefID() == VPRecipeBase::VPReductionEVLSC;
   }
 
   static inline bool classof(const VPUser *U) {
@@ -2498,6 +2511,8 @@ class VPReductionRecipe : public VPRecipeWithIRFlags {
   bool isOrdered() const { return IsOrdered; };
   /// Return true if the in-loop reduction is conditional.
   bool isConditional() const { return IsConditional; };
+  /// Returns true if the reduction outputs a vector with a scaled down VF.
+  bool isPartialReduction() const { return VFScaleFactor > 1; }
   /// The VPValue of the scalar Chain being accumulated.
   VPValue *getChainOp() const { return getOperand(0); }
   /// The VPValue of the vector value to be reduced.
@@ -2506,68 +2521,8 @@ class VPReductionRecipe : public VPRecipeWithIRFlags {
   VPValue *getCondOp() const {
     return isConditional() ? getOperand(getNumOperands() - 1) : nullptr;
   }
-};
-
-/// A recipe for forming partial reductions. In the loop, an accumulator and
-/// vector operand are added together and passed to the next iteration as the
-/// next accumulator. After the loop body, the accumulator is reduced to a
-/// scalar value.
-class VPPartialReductionRecipe : public VPReductionRecipe {
-  unsigned Opcode;
-
-  /// The divisor by which the VF of this recipe's output should be divided
-  /// during execution.
-  unsigned VFScaleFactor;
-
-public:
-  VPPartialReductionRecipe(Instruction *ReductionInst, VPValue *Op0,
-                           VPValue *Op1, VPValue *Cond, unsigned VFScaleFactor)
-      : VPPartialReductionRecipe(ReductionInst->getOpcode(), Op0, Op1, Cond,
-                                 VFScaleFactor, ReductionInst) {}
-  VPPartialReductionRecipe(unsigned Opcode, VPValue *Op0, VPValue *Op1,
-                           VPValue *Cond, unsigned ScaleFactor,
-                           Instruction *ReductionInst = nullptr)
-      : VPReductionRecipe(VPDef::VPPartialReductionSC, RecurKind::Add,
-                          FastMathFlags(), ReductionInst,
-                          ArrayRef<VPValue *>({Op0, Op1}), Cond, false, {}),
-        Opcode(Opcode), VFScaleFactor(ScaleFactor) {
-    [[maybe_unused]] auto *AccumulatorRecipe =
-        getChainOp()->getDefiningRecipe();
-    // When cloning as part of a VPExpressionRecipe, the chain op could have
-    // been removed from the plan and so doesn't have a defining recipe.
-    assert((!AccumulatorRecipe ||
-            isa<VPReductionPHIRecipe>(AccumulatorRecipe) ||
-            isa<VPPartialReductionRecipe>(AccumulatorRecipe)) &&
-           "Unexpected operand order for partial reduction recipe");
-  }
-  ~VPPartialReductionRecipe() override = default;
-
-  VPPartialReductionRecipe *clone() override {
-    return new VPPartialReductionRecipe(Opcode, getOperand(0), getOperand(1),
-                                        getCondOp(), VFScaleFactor,
-                                        getUnderlyingInstr());
-  }
-
-  VP_CLASSOF_IMPL(VPDef::VPPartialReductionSC)
-
-  /// Generate the reduction in the loop.
-  void execute(VPTransformState &State) override;
-
-  /// Return the cost of this VPPartialReductionRecipe.
-  InstructionCost computeCost(ElementCount VF,
-                              VPCostContext &Ctx) const override;
-
-  /// Get the binary op's opcode.
-  unsigned getOpcode() const { return Opcode; }
-
   /// Get the factor that the VF of this recipe's output should be scaled by.
   unsigned getVFScaleFactor() const { return VFScaleFactor; }
-
-#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
-  /// Print the recipe.
-  void print(raw_ostream &O, const Twine &Indent,
-             VPSlotTracker &SlotTracker) const override;
-#endif
 };
 
 /// A recipe to represent inloop reduction operations with vector-predication
@@ -2583,7 +2538,7 @@ class VPReductionEVLRecipe : public VPReductionRecipe {
             R.getFastMathFlags(),
             cast_or_null<Instruction>(R.getUnderlyingValue()),
             ArrayRef<VPValue *>({R.getChainOp(), R.getVecOp(), &EVL}), CondOp,
-            R.isOrdered(), DL) {}
+            R.isOrdered(), 0, DL) {}
 
   ~VPReductionEVLRecipe() override = default;
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 92db9674ef42b..6aa3750897309 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -282,10 +282,10 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
               [](const auto *R) { return R->getScalarType(); })
           .Case<VPReductionRecipe, VPPredInstPHIRecipe, VPWidenPHIRecipe,
                 VPScalarIVStepsRecipe, VPWidenGEPRecipe, VPVectorPointerRecipe,
-                VPVectorEndPointerRecipe, VPWidenCanonicalIVRecipe,
-                VPPartialReductionRecipe>([this](const VPRecipeBase *R) {
-            return inferScalarType(R->getOperand(0));
-          })
+                VPVectorEndPointerRecipe, VPWidenCanonicalIVRecipe>(
+              [this](const VPRecipeBase *R) {
+                return inferScalarType(R->getOperand(0));
+              })
           // VPInstructionWithType must be handled before VPInstruction.
           .Case<VPInstructionWithType, VPWidenIntrinsicRecipe,
                 VPWidenCastRecipe>(
@@ -396,7 +396,7 @@ bool VPDominatorTree::properlyDominates(const VPRecipeBase *A,
 static unsigned getVFScaleFactor(VPRecipeBase *R) {
   if (auto *RR = dyn_cast<VPReductionPHIRecipe>(R))
     return RR->getVFScaleFactor();
-  if (auto *RR = dyn_cast<VPPartialReductionRecipe>(R))
+  if (auto *RR = dyn_cast<VPReductionRecipe>(R))
     return RR->getVFScaleFactor();
   assert(
       (!isa<VPInstruction>(R) || cast<VPInstruction>(R)->getOpcode() !=
@@ -566,8 +566,9 @@ SmallVector<VPRegisterUsage, 8> llvm::calculateRegisterUsageForPlan(
         } else {
           // The output from scaled phis and scaled reductions actually has
           // fewer lanes than the VF.
-          unsigned ScaleFactor = getVFScaleFactor(R);
-          ElementCount VF = VFs[J].divideCoefficientBy(ScaleFactor);
+          ElementCount VF = VFs[J];
+          if (unsigned ScaleFactor = getVFScaleFactor(R))
+            VF = VF.divideCoefficientBy(ScaleFactor);
           LLVM_DEBUG(if (VF != VFs[J]) {
             dbgs() << "LV(REG): Scaled down VF from " << VFs[J] << " to " << VF
                    << " for " << *R << "\n";
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index b485fa3d86e40..9dc3fb1b97447 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -164,7 +164,6 @@ bool VPRecipeBase::mayHaveSideEffects() const {
     return cast<VPWidenIntrinsicRecipe>(this)->mayHaveSideEffects();
   case VPBlendSC:
   case VPReductionEVLSC:
-  case VPPartialReductionSC:
   case VPReductionSC:
   case VPScalarIVStepsSC:
   case VPVectorPointerSC:
@@ -294,104 +293,6 @@ bool VPRecipeBase::isScalarCast() const {
   return VPI && Instruction::isCast(VPI->getOpcode());
 }
 
-InstructionCost
-VPPartialReductionRecipe::computeCost(ElementCount VF,
-                                      VPCostContext &Ctx) const {
-  std::optional<unsigned> Opcode;
-  VPValue *Op = getOperand(0);
-  VPRecipeBase *OpR = Op->getDefiningRecipe();
-
-  // If the partial reduction is predicated, a select will be operand 0
-  using namespace llvm::VPlanPatternMatch;
-  if (match(getOperand(1), m_Select(m_VPValue(), m_VPValue(Op), m_VPValue()))) {
-    OpR = Op->getDefiningRecipe();
-  }
-
-  Type *InputTypeA = nullptr, *InputTypeB = nullptr;
-  TTI::PartialReductionExtendKind ExtAType = TTI::PR_None,
-                                  ExtBType = TTI::PR_None;
-
-  auto GetExtendKind = [](VPRecipeBase *R) {
-    if (!R)
-      return TTI::PR_None;
-    auto *WidenCastR = dyn_cast<VPWidenCastRecipe>(R);
-    if (!WidenCastR)
-      return TTI::PR_None;
-    if (WidenCastR->getOpcode() == Instruction::CastOps::ZExt)
-      return TTI::PR_ZeroExtend;
-    if (WidenCastR->getOpcode() == Instruction::CastOps::SExt)
-      return TTI::PR_SignExtend;
-    return TTI::PR_None;
-  };
-
-  // Pick out opcode, type/ext information and use sub side effects from a widen
-  // recipe....
[truncated]

@NickGuy-Arm NickGuy-Arm left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fhahn fhahn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the latest version of this depend on #147026?

@SamTebbs33

Copy link
Copy Markdown
Contributor Author

does the latest version of this depend on #147026?

it does, so that chained partial reductions with subs will be supported straight away.

SamTebbs33 added a commit that referenced this pull request Aug 12, 2025
This PR allows the loop vectorizer to handle in-loop sub reductions by
forming a normal in-loop add reduction with a negated input.

Stacked PRs:
1. -> #147026
2. #147255
3. #147302
4. #147513
llvm-sync Bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 12, 2025
This PR allows the loop vectorizer to handle in-loop sub reductions by
forming a normal in-loop add reduction with a negated input.

Stacked PRs:
1. -> llvm/llvm-project#147026
2. llvm/llvm-project#147255
3. llvm/llvm-project#147302
4. llvm/llvm-project#147513
SamTebbs33 added a commit that referenced this pull request Sep 1, 2025
This PR bundles sub reductions into the VPExpressionRecipe class and
adjusts the cost functions to take the negation into account.

Stacked PRs:
1. #147026
2. -> #147255
3. #147302
4. #147513
llvm-sync Bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 1, 2025
This PR bundles sub reductions into the VPExpressionRecipe class and
adjusts the cost functions to take the negation into account.

Stacked PRs:
1. llvm/llvm-project#147026
2. -> llvm/llvm-project#147255
3. llvm/llvm-project#147302
4. llvm/llvm-project#147513
@SamTebbs33

Copy link
Copy Markdown
Contributor Author

Ping.

@SamTebbs33 SamTebbs33 force-pushed the users/SamTebbs33/expression-recipe-pred branch from 7263435 to d634251 Compare October 8, 2025 09:28
@SamTebbs33 SamTebbs33 force-pushed the pred-as-reduction-bundle-recipe branch from 0a4dd84 to 9130747 Compare October 9, 2025 10:45
@SamTebbs33 SamTebbs33 changed the base branch from users/SamTebbs33/expression-recipe-pred to users/SamTebbs33/bundle-constant-mul October 9, 2025 10:45

@gbossu gbossu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just drive-by questions

Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
class LLVM_ABI_FOR_TEST VPReductionRecipe : public VPRecipeWithIRFlags {
/// The recurrence kind for the reduction in question.
RecurKind RdxKind;
bool IsOrdered;
/// Whether the reduction is conditional.
bool IsConditional = false;
/// The scaling factor, relative to the VF, that this recipe's output is
/// divided by.
/// For outer-loop reductions this is equal to 1.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm getting a bit confused. In which case do we get a VPReductionRecipe for "outer-loop" reductions? I thought it was only used when the reduction result was computed inside the loop (sometimes partially). For "outer-loop" reductions, don't we "widen everything" and compute the reduction result in the middle block?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right actually! I've removed that part of the comment.

Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated

/// Get the binary op's opcode.
unsigned getOpcode() const { return Opcode; }

/// Get the factor that the VF of this recipe's output should be scaled by.
unsigned getVFScaleFactor() const { return VFScaleFactor; }

@gbossu gbossu Oct 20, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this interface is now a bit misleading. We might return 0, which actually means the reduction is "scaled" by whatever the loop VF is, not 0.

Maybe something like below would be clearer:

/// Get the factor that the VF of this recipe's output should be scaled by,
/// if it represents a partial reduction.
std::optional<unsigned> getPartialVFScaleFactor() const {
  return isPartial() ? VFScaleFactor : std::nullopt;
}

This way, it forces users of getPartialVFScaleFactor() to question what the result (or lack thereof) means. What do you think?

Edit: getPartialReductionScaleFactor() might be a clearer name.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that, thanks. I'd prefer to keep the current name though.

Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
/// For in-loop reductions this is equal to 0, to specify that this is equal
/// to the VF (which may not be known yet). For partial-reductions this is
/// equal to another scalar value.
unsigned VFScaleFactor;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to overload the meaning of this variable, and keep IsInLoop semantically separate from the scale factor for partial reductions.

I wonder if a variant with empty types would help here, something like:

struct RdxOrderedInLoop {};
struct RdxInLoop {};
struct RdxNormal {};
struct RdxPartial {
  unsigned VFScaleFactor;
};
std::variant<RdxOrderedInLoop, RdxInLoop, RdxNormal, RdxPartial> RdxStyle;

That way we can remove the separate boolean flags and asserts that they are consistent, and we don't need to overload the meaning of the scale factor.

(I checked that this has been used elsewhere in the codebase; there's a couple places so far, but LockFileManager.h has the best example).

@fhahn any objections?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the suggestion!

SamTebbs33 added a commit that referenced this pull request Oct 23, 2025
This PR bundles partial reductions inside the VPExpressionRecipe class.

Stacked PRs:
1. #147026
2. #147255
3. #156976
4. #160154
5. -> #147302
6. #162503
7. #147513
llvm-sync Bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 23, 2025
dvbuka pushed a commit to dvbuka/llvm-project that referenced this pull request Oct 27, 2025
This PR bundles partial reductions inside the VPExpressionRecipe class.

Stacked PRs:
1. llvm#147026
2. llvm#147255
3. llvm#156976
4. llvm#160154
5. -> llvm#147302
6. llvm#162503
7. llvm#147513
@SamTebbs33 SamTebbs33 force-pushed the pred-as-reduction-bundle-recipe branch 2 times, most recently from ac146a7 to 0fe3c78 Compare November 4, 2025 16:55

@huntergr-arm huntergr-arm left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -410,8 +410,8 @@ define i32 @dotp_predicated(i64 %N, ptr %a, ptr %b) {
; CHECK-NEXT: [[TMP178:%.*]] = sext <16 x i8> [[TMP177]] to <16 x i32>
; CHECK-NEXT: [[TMP97:%.*]] = sext <16 x i8> [[TMP159]] to <16 x i32>
; CHECK-NEXT: [[TMP179:%.*]] = mul nsw <16 x i32> [[TMP178]], [[TMP97]]

@fhahn fhahn Nov 18, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s a bit difficult to see with the rest changes, but I assume this the change is NFC and the test changes are just stray updates?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are stray updates, indeed, so I've reverted these unnecessary changes. Thanks!

@SamTebbs33 SamTebbs33 disabled auto-merge November 18, 2025 15:14
PhiRecipe = new VPReductionPHIRecipe(
Phi, RdxDesc.getRecurrenceKind(), *StartV, CM.isInLoopReduction(Phi),
CM.useOrderedReductions(RdxDesc), ScaleFactor);
RdxStyle Style(RdxNormal{});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
RdxStyle Style(RdxNormal{});
RdxStyle Style(RdxNormal());

I think generally () is used to call constructors instead of {}, not sure if there's a benefit of using {}?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a struct, so I think we have to use {}. My compiler complains at me when I use ():

warning: parentheses were disambiguated as a function declaration
note: replace parentheses with braces to declare a variable

and

error: assignment of function ‘llvm::RdxStyle llvm::Style(RdxNormal (*)())’

Comment on lines -315 to -321
if (isConditional()) {
CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
auto *VecTy = Ctx.Types.inferScalarType(Op);
auto *CondTy = Ctx.Types.inferScalarType(getCondOp());
CondCost = Ctx.TTI.getCmpSelInstrCost(Instruction::Select, VecTy, CondTy,
Pred, Ctx.CostKind);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be moved to VPReductionRecipe::computeCost. It seems like we may be missing test coverage for this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, that compute cost function isn't considering the predicate. I moved it over locally and that regressed the codegen for the mve-reductions.ll tests. I think this change should be done separately since it's outside the scope of the PR. Are you okay with that?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that’s surprising, are we forming partial reductions for those cases? If not than it would be good to check first why the behavior changed, because that would be somewhat unexpected. I think we also have the predicate cost on main, but maybe the difference would be that after moving we may already have and pass the correct vector types?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For regular reductions there would be a [VP]select in the plan to represent the Condition, but from what I can see for in-loop/ordered reductions the condition is just not represented in the VPReductionRecipe::computeCost function so adding it would introduce changes for MVE. I agree this is best fixed in a separate PR.

For the purpose of this PR though, we don't want to miss out on the condition for the partial reduction case. Can you move the CmpSelInstrCost into the block where it calls getPartialReductionCost, so that we don't regress for partial reductions either?
(I'm not sure if we have a proper test for this at the moment, if not can you add one?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that’s surprising, are we forming partial reductions for those cases? If not than it would be good to check first why the behavior changed, because that would be somewhat unexpected. I think we also have the predicate cost on main, but maybe the difference would be that after moving we may already have and pass the correct vector types?

When I first tried moving it over, I applied the cost to all cases in VPReductionRecipe::computeCost, which is what introduced the regressions in mve-reductions.ll. If I just apply the extra cost to the partial reduction case then nothing changes so we should be good to go with that approach and fix up the cost model for other conditional reductions later.

@github-actions

github-actions Bot commented Nov 18, 2025

Copy link
Copy Markdown

🐧 Linux x64 Test Results

  • 186603 tests passed
  • 4888 tests skipped

Comment thread llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
// This reduction is in-loop.
struct RdxInLoop {};
// This reduction isn't partial, ordered or in-loop.
struct RdxNormal {};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather see RdxNormal and RdxPartial merged into one RdxUnordered, because:

  • a partial reduction is an unordered reduction with a scale factor > 1.
  • a normal reduction is an unordered reduction with a scale factor of 1.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't a normal reduction an unordered reduction with a scale factor of VF? Which is actually why it gets a separate variant, given that VF can be unknown at compile time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I confused "normal" and "inloop". When do we actually get a "full unordered out-of-loop" reduction? Isn't that supposed to be a ComputeReductionResult recipe instead?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ComputeReductionResult should be the final node in the middle block for a normal reduction (or more specialized recipes for AnyOf or Find'X' reductions). But there needs to be a header phi and an update operation inside the loop as well, we just don't reduce across lanes within the vector body.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do the whole reduction out of the loop, there needs to be an update operation inside the loop. For a simple example of an add reduction, we would have something like this for a 'normal' reduction:

%header = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %update, %vector.body ]
...
%update = add <4 x i32> %header, %new.data
...
middle.block
%result = call i32 @llvm.vector.reduce.add(<4 x i32> %update)

So there's an update in the loop, but we don't perform any operation across vector lanes until we exit the vector body.

For an in-loop reduction, we would have a scalar phi doing something like this:

%header = phi i32 [ 0, %vector.ph ], [ %update, %vector.body ]
...
%in.loop.reduction = call i32 @llvm.vector.reduce.add(<4 x i32> %new.data)
%update = add i32 %header, %in.loop.reduction
...
;; No need for extra operations, as you already have the result

Partial reductions are like normal reductions in that they accumulate into a vector in the loop, but the VF of the phi and the VF of the new data don't have to be the same. So there can be some cross-lane reductions, e.g. when going from <8 x i32> to <4 x i32> after the inputs were extended from <8 x i16>.

Ordered reductions are inner loop reductions that produce a scalar, but we must emit an update operation that does not reassociate the update calculations.

(I think I've got that right...)

@gbossu gbossu Nov 20, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did spend some time double checking because this contradicted my understanding of "normal" reductions.

The corresponding recipe for the %update will just be a WIDEN of the scalar add, not a WidenReduction recipe. I also checked the code, and I think the only place where we create a RdxNormal reduction is alongside a ScaleFactor > 1. So AFAIU, RdxNormal can be renamed to RdxPartial. Which is a form of inloop reduction where the elements are reduced to VF / ScaleFactor in the loop, and to a scalar in the middle block.

For other styles,I do agree with your comment, thanks for writing all the details :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a normal reduction, ScaleFactor == 1, so there's no change in VF.

AIUI, @paulwalker-arm 's long-term plan was to unify 'normal', 'partial', and 'in-loop' reductions to always use a partial.reduction intrinsic for the update, since the intrinsic can handle all 3 cases. It's only the ordered case that's different. If the VF is the same then a later pass could exchange it for a normal widened operation for targets which aren't partial-reduction aware.

@gbossu gbossu Nov 21, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that plan makes sense, even if it will require to have a partial.reduce.xxxx intrinsic for every type of reduction being supported. But if it harmonises the codegen for reductions, that's probably worth it.

Still, my 2 cents: for now I think RdxNormal (Edit: now called RdxUnordered) is a confusing name, because it's really only used for partial accumulation within the loop. It would be clearer to rename it to RdxPartial and enforce ScaleFactor>1 in its constructor. We can easily rename that ReductionStyle when it actually gains new capabilities.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC VPReductionRecipe was introduced because there were no suitable intrinsics at that time. The variants that map directly to an intrinsic should not need a dedicated recipe at all down the line, as we can just generate the appropriate intrinsic directly

Comment on lines +8192 to +8200
RdxStyle Style(RdxNormal{});
if (UseInLoopReduction) {
if (UseOrderedReductions)
Style = RdxOrderedInLoop{};
else
Style = RdxInLoop{};
} else if (ScaleFactor > 1) {
Style = RdxPartial{/*VFScaleFactor=*/ScaleFactor};
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe create a convenience function for this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I'd like to have namespaced it in ReductionStyle but that's just an alias for the std::variant.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vouch for that as well. Maybe ReductionStyle::get(UseInLoopReduction, UseOrderedReductions, ScaleFactor)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't add extra methods to a std::variant, but maybe a static getReductionStyle helper function?

Comment on lines +8193 to +8197
if (UseInLoopReduction) {
if (UseOrderedReductions)
Style = RdxOrderedInLoop{};
else
Style = RdxInLoop{};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be simplified to:

  Style = RdxOrderedInLoop{};
else if (UseInLoopReductions)
  Style = RdxInLoop{};
else ..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
Comment on lines +2425 to +2426
auto *Partial = std::get_if<RdxPartial>(&Style);
return Partial ? std::make_optional(Partial->VFScaleFactor) : std::nullopt;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this function be changed so that calling getVFScaleFactor always returns a value. This would mean it can only ever be called if for unordered reductions. Especially when partial- and normal reductions are merged, this would remove the need for .value_or(1) I see in various places.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've made 1 the default case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I think I suggested this here due to ScaleFactor having a confusing definition: #147513 (comment)

Now that we have a variant for the reduction style, we might be able to do better.

Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
// Possible variants of a reduction.

// This reduction is ordered and in-loop.
struct RdxOrderedInLoop {};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
struct RdxOrderedInLoop {};
struct RdxOrdered {};

because ordered implies in-loop.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines -315 to -321
if (isConditional()) {
CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
auto *VecTy = Ctx.Types.inferScalarType(Op);
auto *CondTy = Ctx.Types.inferScalarType(getCondOp());
CondCost = Ctx.TTI.getCmpSelInstrCost(Instruction::Select, VecTy, CondTy,
Pred, Ctx.CostKind);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For regular reductions there would be a [VP]select in the plan to represent the Condition, but from what I can see for in-loop/ordered reductions the condition is just not represented in the VPReductionRecipe::computeCost function so adding it would introduce changes for MVE. I agree this is best fixed in a separate PR.

For the purpose of this PR though, we don't want to miss out on the condition for the partial reduction case. Can you move the CmpSelInstrCost into the block where it calls getPartialReductionCost, so that we don't regress for partial reductions either?
(I'm not sure if we have a proper test for this at the moment, if not can you add one?)

PrevInChain->getType(), Intrinsic::vector_partial_reduce_add,
{PrevInChain, NewVecOp}, nullptr, "partial.reduce");
PrevInChain = NewRed;
NextInChain = NewRed;
} else {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe add an assert that this is an in-loop reduction?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
Comment on lines +8192 to +8200
RdxStyle Style(RdxNormal{});
if (UseInLoopReduction) {
if (UseOrderedReductions)
Style = RdxOrderedInLoop{};
else
Style = RdxInLoop{};
} else if (ScaleFactor > 1) {
Style = RdxPartial{/*VFScaleFactor=*/ScaleFactor};
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vouch for that as well. Maybe ReductionStyle::get(UseInLoopReduction, UseOrderedReductions, ScaleFactor)?

Comment thread llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Comment thread llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
// This reduction is in-loop.
struct RdxInLoop {};
// This reduction isn't partial, ordered or in-loop.
struct RdxNormal {};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't a normal reduction an unordered reduction with a scale factor of VF? Which is actually why it gets a separate variant, given that VF can be unknown at compile time.

Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
Comment on lines +2425 to +2426
auto *Partial = std::get_if<RdxPartial>(&Style);
return Partial ? std::make_optional(Partial->VFScaleFactor) : std::nullopt;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I think I suggested this here due to ScaleFactor having a confusing definition: #147513 (comment)

Now that we have a variant for the reduction style, we might be able to do better.

Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
/// When expanding the reduction PHI, the plan's VF element count is divided
/// by this factor to form the reduction phi's VF.
unsigned VFScaleFactor = 1;
RdxStyle Style;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nicer!

@SamTebbs33 SamTebbs33 force-pushed the pred-as-reduction-bundle-recipe branch from d37f98c to e675a11 Compare November 20, 2025 16:49
InstructionCost CondCost = 0;
if (isConditional()) {
CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
auto *CondTy = Ctx.Types.inferScalarType(getCondOp());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be:

auto *CondTy = cast<VectorType>(toVectorTy(Ctx.Types.inferScalartype(getCondOp()), VF));

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that's correct, done.

Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
struct RdxInLoop {};
// This reduction is normal with a >= 1 factor that the vector length is
// scaled down by.
struct RdxNormal {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think RdxUnordered is a better name.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sdesmalen-arm sdesmalen-arm left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits addressed

Comment on lines +8822 to +8823
bool UseOrderedReductions = PhiR->isOrdered();
ReductionStyle Style = getReductionStyle(true, UseOrderedReductions, 1);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
bool UseOrderedReductions = PhiR->isOrdered();
ReductionStyle Style = getReductionStyle(true, UseOrderedReductions, 1);
ReductionStyle Style = getReductionStyle(true, PhiR->isOrdered(), 1);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
Comment on lines +2414 to +2415
// This reduction is normal with a >= 1 factor that the vector length is
// scaled down by.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This reduction is normal with a >= 1 factor that the vector length is
// scaled down by.
// This reduction is unordered with the partial result scaled down by some factor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
// This reduction is normal with a >= 1 factor that the vector length is
// scaled down by.
struct RdxUnordered {
// The factor by which the output is scaled down from the VF.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: superfluous comment, given the comment above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
@@ -2918,7 +2892,7 @@ class LLVM_ABI_FOR_TEST VPReductionEVLRecipe : public VPReductionRecipe {
R.getFastMathFlags(),
cast_or_null<Instruction>(R.getUnderlyingValue()),
ArrayRef<VPValue *>({R.getChainOp(), R.getVecOp(), &EVL}), CondOp,
R.isOrdered(), DL) {}
getReductionStyle(true, R.isOrdered(), 1), DL) {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
getReductionStyle(true, R.isOrdered(), 1), DL) {}
getReductionStyle(/*InLoop=*/true, R.isOrdered(), 1), DL) {}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Partial reductions can easily be represented by the VPReductionRecipe
class by setting their scale factor to something greater than 1. This PR
merges the two together and gives VPReductionRecipe a VFScaleFactor so
that it can choose to generate the partial reduction intrinsic at
execute time.

Depends on llvm#144281
@SamTebbs33 SamTebbs33 force-pushed the pred-as-reduction-bundle-recipe branch from 373f118 to 573e002 Compare November 26, 2025 10:25

@fhahn fhahn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

Comment on lines +7026 to +7027
if (auto *VPR = dyn_cast<VPReductionRecipe>(&R);
VPR && VPR->isPartialReduction())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (auto *VPR = dyn_cast<VPReductionRecipe>(&R);
VPR && VPR->isPartialReduction())
if (auto *VPR = dyn_cast<VPReductionRecipe>(&R))
if (VPR->isPartialReduction())

assigning as part of the if doesn't seem to save lines and I think we generally avoid it in the vectorizer code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
Comment on lines +2408 to +2417
static ReductionStyle getReductionStyle(bool InLoop, bool Ordered,
unsigned ScaleFactor) {
assert((!Ordered || InLoop) && "Ordered implies in-loop");
if (Ordered)
return RdxOrdered{};
if (InLoop)
return RdxInLoop{};
return RdxUnordered{/*VFScaleFactor=*/ScaleFactor};
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is only used in LoopVectorize.cpp, can you move it there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread llvm/lib/Transforms/Vectorize/VPlan.h Outdated
Comment on lines +2395 to +2406
// Possible variants of a reduction.

// This reduction is ordered and in-loop.
struct RdxOrdered {};
// This reduction is in-loop.
struct RdxInLoop {};
// This reduction is unordered with the partial result scaled down by some
// factor.
struct RdxUnordered {
unsigned VFScaleFactor;
};
using ReductionStyle = std::variant<RdxOrdered, RdxInLoop, RdxUnordered>;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to use /// here so this gets included in the docs properly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in VPlan.h as well, in the VPReductionEVLRecipe constructor.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, if left in the header it should probably be inline instead of static

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Just realised I responded to the wrong comment as well 😆

@SamTebbs33 SamTebbs33 merged commit 071d1fb into llvm:main Nov 26, 2025
10 checks passed
llvm-sync Bot pushed a commit to arm/arm-toolchain that referenced this pull request Nov 26, 2025
Partial reductions can easily be represented by the VPReductionRecipe
class by setting their scale factor to something greater than 1. This PR
merges the two together and gives VPReductionRecipe a VFScaleFactor so
that it can choose to generate the partial reduction intrinsic at
execute time.

Stacked PRs:
1. llvm/llvm-project#147026
2. llvm/llvm-project#147255
3. llvm/llvm-project#156976
4. llvm/llvm-project#160154
5. llvm/llvm-project#147302
6. llvm/llvm-project#162503
7. -> llvm/llvm-project#147513

Replaces llvm/llvm-project#146073 .
GeneraluseAI pushed a commit to GeneraluseAI/llvm-project that referenced this pull request Nov 27, 2025
Partial reductions can easily be represented by the VPReductionRecipe
class by setting their scale factor to something greater than 1. This PR
merges the two together and gives VPReductionRecipe a VFScaleFactor so
that it can choose to generate the partial reduction intrinsic at
execute time.

Stacked PRs:
1. llvm#147026
2. llvm#147255
3. llvm#156976
4. llvm#160154
5. llvm#147302
6. llvm#162503
7. -> llvm#147513

Replaces llvm#146073 .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants