Skip to content

[AArch64] Add intra-block CSINC optimization to AArch64ConditionOptimizer#173734

Merged
nasherm merged 4 commits intollvm:mainfrom
hussam-alhassan:aarch64-intrablock-csinc
Jan 8, 2026
Merged

[AArch64] Add intra-block CSINC optimization to AArch64ConditionOptimizer#173734
nasherm merged 4 commits intollvm:mainfrom
hussam-alhassan:aarch64-intrablock-csinc

Conversation

@hussam-alhassan
Copy link
Contributor

@hussam-alhassan hussam-alhassan commented Dec 27, 2025

This patch extends the AArch64ConditionOptimizer pass to handle CSINC instructions within a single basic block, complementing the existing cross-block branch optimization.

The optimization finds two CMP+CSINC pairs comparing the same register with immediates differing by 1, and adjusts one comparison to enable CSE to eliminate the redundant CMP instruction.

Example transformation:

  cmp  w8, #10
  csinc w9, w0, w1, gt     ; w9 = (w8 > 10) ? w0 : w1+1
  cmp  w8, #9              ; Removed by CSE after adjustment
  csinc w10, w0, w1, gt    ; w10 = (w8 > 9) ? w0 : w1+1

After optimization:

  cmp  w8, #10
  csinc w9, w0, w1, gt     ; w9 = (w8 > 10) ? w0 : w1+1
  csinc w10, w0, w1, ge    ; w10 = (w8 >= 10) ? w0 : w1+1

The existing cross-block logic has also been extracted into its own method.

Any feedback on code quality and better practices is highly welcome.

…izer

This patch extends the AArch64ConditionOptimizer pass to handle CSINC
instructions within a single basic block, complementing the existing
cross-block branch optimization.

The optimization finds two CMP+CSINC pairs comparing the same register
with immediates differing by 1, and adjusts one comparison to enable
CSE to eliminate the redundant CMP instruction.

Example transformation:
  cmp  w8, llvm#10
  csinc w9, w0, w1, gt     ; w9 = (w8 > 10) ? w0 : w1+1
  cmp  w8, llvm#9              ; Removed by CSE after adjustment
  csinc w10, w0, w1, gt    ; w10 = (w8 > 9) ? w0 : w1+1

After optimization:
  cmp  w8, llvm#10
  csinc w9, w0, w1, gt     ; w9 = (w8 > 10) ? w0 : w1+1
  csinc w10, w0, w1, ge    ; w10 = (w8 >= 10) ? w0 : w1+1

The existing cross-block logic has also been extracted into its own method.
@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Dec 27, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Hussam A. (hussam-alhassan)

Changes

This patch extends the AArch64ConditionOptimizer pass to handle CSINC instructions within a single basic block, complementing the existing cross-block branch optimization.

The optimization finds two CMP+CSINC pairs comparing the same register with immediates differing by 1, and adjusts one comparison to enable CSE to eliminate the redundant CMP instruction.

Example transformation:
cmp w8, #10
csinc w9, w0, w1, gt ; w9 = (w8 > 10) ? w0 : w1+1
cmp w8, #9 ; Removed by CSE after adjustment
csinc w10, w0, w1, gt ; w10 = (w8 > 9) ? w0 : w1+1

After optimization:
cmp w8, #10
csinc w9, w0, w1, gt ; w9 = (w8 > 10) ? w0 : w1+1
csinc w10, w0, w1, ge ; w10 = (w8 >= 10) ? w0 : w1+1

The existing cross-block logic has also been extracted into its own method.

Any feedback on code quality and better practices is highly welcome.


Patch is 35.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/173734.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp (+290-110)
  • (modified) llvm/test/CodeGen/AArch64/combine-comparisons-by-cse.ll (+174-70)
diff --git a/llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp
index 4c9f8c2723493..fef5f5a84d937 100644
--- a/llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ConditionOptimizer.cpp
@@ -6,15 +6,17 @@
 //
 //===----------------------------------------------------------------------===//
 //
-// This pass tries to make consecutive compares of values use same operands to
-// allow CSE pass to remove duplicated instructions.  For this it analyzes
-// branches and adjusts comparisons with immediate values by converting:
-//  * GE -> GT
-//  * GT -> GE
-//  * LT -> LE
-//  * LE -> LT
-// and adjusting immediate values appropriately.  It basically corrects two
-// immediate values towards each other to make them equal.
+//
+// This pass tries to make consecutive comparisons of values use the same
+// operands to allow the CSE pass to remove duplicate instructions. It adjusts
+// comparisons with immediate values by converting between inclusive and
+// exclusive forms (GE <-> GT, LE <-> LT) and correcting immediate values to
+// make them equal.
+//
+// The pass handles:
+//  * Cross-block: SUBS/ADDS followed by conditional branches
+//  * Intra-block: CSINC conditional instructions
+//
 //
 // Consider the following example in C:
 //
@@ -49,11 +51,16 @@
 //     b.le     .LBB0_6
 //     ...
 //
-// Currently only SUBS and ADDS followed by b.?? are supported.
+// See optimizeCrossBlock() and optimizeIntraBlock() for implementation details.
 //
 // TODO: maybe handle TBNZ/TBZ the same way as CMP when used instead for "a < 0"
-// TODO: handle other conditional instructions (e.g. CSET)
-// TODO: allow second branching to be anything if it doesn't require adjusting
+// TODO: For cross-block:
+//   - handle other conditional instructions (e.g. CSET)
+//   - allow second branching to be anything if it doesn't require adjusting
+// TODO: For intra-block:
+//   - handle CINC and CSET (CSINC aliases) as their conditions are inverted
+//   compared to CSINC.
+//   - handle other non-CSINC conditional instructions
 //
 //===----------------------------------------------------------------------===//
 
@@ -111,6 +118,9 @@ class AArch64ConditionOptimizer : public MachineFunctionPass {
   void modifyCmp(MachineInstr *CmpMI, const CmpInfo &Info);
   bool adjustTo(MachineInstr *CmpMI, AArch64CC::CondCode Cmp, MachineInstr *To,
                 int ToImm);
+  bool isPureCmp(MachineInstr &CmpMI);
+  bool optimizeIntraBlock(MachineBasicBlock &MBB);
+  bool optimizeCrossBlock(MachineBasicBlock &HBB);
   bool runOnMachineFunction(MachineFunction &MF) override;
 
   StringRef getPassName() const override {
@@ -323,125 +333,295 @@ bool AArch64ConditionOptimizer::adjustTo(MachineInstr *CmpMI,
   return false;
 }
 
-bool AArch64ConditionOptimizer::runOnMachineFunction(MachineFunction &MF) {
-  LLVM_DEBUG(dbgs() << "********** AArch64 Conditional Compares **********\n"
-                    << "********** Function: " << MF.getName() << '\n');
-  if (skipFunction(MF.getFunction()))
+bool AArch64ConditionOptimizer::isPureCmp(MachineInstr &CmpMI) {
+  unsigned ShiftAmt = AArch64_AM::getShiftValue(CmpMI.getOperand(3).getImm());
+  if (!CmpMI.getOperand(2).isImm()) {
+    LLVM_DEBUG(dbgs() << "Immediate of cmp is symbolic, " << CmpMI << '\n');
     return false;
+  } else if (CmpMI.getOperand(2).getImm() << ShiftAmt >= 0xfff) {
+    LLVM_DEBUG(dbgs() << "Immediate of cmp may be out of range, " << CmpMI
+                      << '\n');
+    return false;
+  } else if (!MRI->use_nodbg_empty(CmpMI.getOperand(0).getReg())) {
+    LLVM_DEBUG(dbgs() << "Destination of cmp is not dead, " << CmpMI << '\n');
+    return false;
+  }
 
-  TII = MF.getSubtarget().getInstrInfo();
-  DomTree = &getAnalysis<MachineDominatorTreeWrapperPass>().getDomTree();
-  MRI = &MF.getRegInfo();
-
-  bool Changed = false;
+  return true;
+}
 
-  // Visit blocks in dominator tree pre-order. The pre-order enables multiple
-  // cmp-conversions from the same head block.
-  // Note that updateDomTree() modifies the children of the DomTree node
-  // currently being visited. The df_iterator supports that; it doesn't look at
-  // child_begin() / child_end() until after a node has been visited.
-  for (MachineDomTreeNode *I : depth_first(DomTree)) {
-    MachineBasicBlock *HBB = I->getBlock();
+// This function transforms two CMP+CSINC pairs within the same basic block
+// when both conditions are the same (GT/GT or LT/LT) and immediates differ
+// by 1.
+//
+// Example transformation:
+//   cmp  w8, #10
+//   csinc w9, w0, w1, gt     ; w9 = (w8 > 10) ? w0 : w1+1
+//   cmp  w8, #9
+//   csinc w10, w0, w1, gt    ; w10 = (w8 > 9) ? w0 : w1+1
+//
+// Into:
+//   cmp  w8, #10
+//   csinc w9, w0, w1, gt     ; w9 = (w8 > 10) ? w0 : w1+1
+//   csinc w10, w0, w1, ge    ; w10 = (w8 >= 10) ? w0 : w1+1
+//
+// The second CMP is eliminated, enabling CSE to remove the redundant
+// comparison.
+bool AArch64ConditionOptimizer::optimizeIntraBlock(MachineBasicBlock &MBB) {
+  MachineInstr *FirstCmp = nullptr;
+  MachineInstr *FirstCSINC = nullptr;
+  MachineInstr *SecondCmp = nullptr;
+  MachineInstr *SecondCSINC = nullptr;
+
+  // Find two CMP + CSINC pairs
+  for (MachineInstr &MI : MBB) {
+    switch (MI.getOpcode()) {
+    // cmp is an alias for subs with a dead destination register.
+    case AArch64::SUBSWri:
+    case AArch64::SUBSXri:
+    // cmn is an alias for adds with a dead destination register.
+    case AArch64::ADDSWri:
+    case AArch64::ADDSXri: {
+      if (!FirstCmp) {
+        FirstCmp = &MI;
+      } else if (FirstCSINC && !SecondCmp) {
+        SecondCmp = &MI;
+      }
+      break;
+    }
 
-    SmallVector<MachineOperand, 4> HeadCond;
-    MachineBasicBlock *TBB = nullptr, *FBB = nullptr;
-    if (TII->analyzeBranch(*HBB, TBB, FBB, HeadCond)) {
-      continue;
+    case AArch64::CSINCWr:
+    case AArch64::CSINCXr: {
+      // Found a CSINC, ensure it comes after the corresponding comparison
+      if (FirstCmp && !FirstCSINC) {
+        FirstCSINC = &MI;
+      } else if (SecondCmp && !SecondCSINC) {
+        SecondCSINC = &MI;
+      }
+      break;
+    }
     }
 
-    // Equivalence check is to skip loops.
-    if (!TBB || TBB == HBB) {
-      continue;
+    if (SecondCSINC)
+      break;
+  }
+
+  if (!SecondCmp || !SecondCSINC) {
+    LLVM_DEBUG(dbgs() << "Didn't find two CMP+CSINC pairs\n");
+    return false;
+  }
+
+  if (FirstCmp->getOperand(1).getReg() != SecondCmp->getOperand(1).getReg()) {
+    LLVM_DEBUG(dbgs() << "CMPs compare different registers\n");
+    return false;
+  }
+
+  if (!isPureCmp(*FirstCmp) || !isPureCmp(*SecondCmp)) {
+    LLVM_DEBUG(dbgs() << "One or both CMPs are not pure\n");
+    return false;
+  }
+
+  // Check that nothing else modifies the flags between the first CMP and second
+  // conditional
+  for (auto It = std::next(MachineBasicBlock::iterator(FirstCmp));
+       It != std::next(MachineBasicBlock::iterator(SecondCSINC)); ++It) {
+    if (&*It != SecondCmp &&
+        It->modifiesRegister(AArch64::NZCV, /*TRI=*/nullptr)) {
+      LLVM_DEBUG(dbgs() << "Flags modified between CMPs by: " << *It << '\n');
+      return false;
     }
+  }
 
-    SmallVector<MachineOperand, 4> TrueCond;
-    MachineBasicBlock *TBB_TBB = nullptr, *TBB_FBB = nullptr;
-    if (TII->analyzeBranch(*TBB, TBB_TBB, TBB_FBB, TrueCond)) {
-      continue;
+  // Check flags aren't read after second conditional within the same block
+  for (auto It = std::next(MachineBasicBlock::iterator(SecondCSINC));
+       It != MBB.end(); ++It) {
+    if (It->readsRegister(AArch64::NZCV, /*TRI=*/nullptr)) {
+      LLVM_DEBUG(dbgs() << "Flags read after second CSINC by: " << *It << '\n');
+      return false;
     }
+  }
 
-    MachineInstr *HeadCmpMI = findSuitableCompare(HBB);
-    if (!HeadCmpMI) {
-      continue;
+  // Since we may modify a cmp in this MBB, make sure NZCV does not live out.
+  for (auto *SuccBB : MBB.successors())
+    if (SuccBB->isLiveIn(AArch64::NZCV))
+      return false;
+
+  // Extract condition codes from both CSINCs (operand 3)
+  AArch64CC::CondCode FirstCond =
+      (AArch64CC::CondCode)(int)FirstCSINC->getOperand(3).getImm();
+  AArch64CC::CondCode SecondCond =
+      (AArch64CC::CondCode)(int)SecondCSINC->getOperand(3).getImm();
+
+  const int FirstImm = (int)FirstCmp->getOperand(2).getImm();
+  const int SecondImm = (int)SecondCmp->getOperand(2).getImm();
+
+  LLVM_DEBUG(dbgs() << "Comparing intra-block CSINCs: "
+                    << AArch64CC::getCondCodeName(FirstCond) << " #" << FirstImm
+                    << " and " << AArch64CC::getCondCodeName(SecondCond) << " #"
+                    << SecondImm << '\n');
+
+  // Check if both conditions are the same and immediates differ by 1
+  if (((FirstCond == AArch64CC::GT && SecondCond == AArch64CC::GT) ||
+       (FirstCond == AArch64CC::LT && SecondCond == AArch64CC::LT)) &&
+      std::abs(SecondImm - FirstImm) == 1) {
+    // Pick which comparison to adjust to match the other
+    // For GT: adjust the one with smaller immediate
+    // For LT: adjust the one with larger immediate
+    bool adjustFirst = (FirstImm < SecondImm);
+    if (FirstCond == AArch64CC::LT) {
+      adjustFirst = !adjustFirst;
     }
 
-    MachineInstr *TrueCmpMI = findSuitableCompare(TBB);
-    if (!TrueCmpMI) {
-      continue;
+    MachineInstr *CmpToAdjust = adjustFirst ? FirstCmp : SecondCmp;
+    MachineInstr *CSINCToAdjust = adjustFirst ? FirstCSINC : SecondCSINC;
+    AArch64CC::CondCode CondToAdjust = adjustFirst ? FirstCond : SecondCond;
+    int TargetImm = adjustFirst ? SecondImm : FirstImm;
+
+    CmpInfo AdjustedInfo = adjustCmp(CmpToAdjust, CondToAdjust);
+
+    if (std::get<0>(AdjustedInfo) == TargetImm &&
+        std::get<1>(AdjustedInfo) ==
+            (adjustFirst ? SecondCmp : FirstCmp)->getOpcode()) {
+      LLVM_DEBUG(dbgs() << "Successfully optimizing intra-block CSINC pair\n");
+
+      // Modify the selected CMP and CSINC
+      CmpToAdjust->getOperand(2).setImm(std::get<0>(AdjustedInfo));
+      CmpToAdjust->setDesc(TII->get(std::get<1>(AdjustedInfo)));
+      CSINCToAdjust->getOperand(3).setImm(std::get<2>(AdjustedInfo));
+
+      return true;
     }
+  }
 
-    AArch64CC::CondCode HeadCmp;
-    if (HeadCond.empty() || !parseCond(HeadCond, HeadCmp)) {
-      continue;
+  return false;
+}
+
+// Optimize across blocks
+bool AArch64ConditionOptimizer::optimizeCrossBlock(MachineBasicBlock &HBB) {
+  SmallVector<MachineOperand, 4> HeadCond;
+  MachineBasicBlock *TBB = nullptr, *FBB = nullptr;
+  if (TII->analyzeBranch(HBB, TBB, FBB, HeadCond)) {
+    return false;
+  }
+
+  // Equivalence check is to skip loops.
+  if (!TBB || TBB == &HBB) {
+    return false;
+  }
+
+  SmallVector<MachineOperand, 4> TrueCond;
+  MachineBasicBlock *TBB_TBB = nullptr, *TBB_FBB = nullptr;
+  if (TII->analyzeBranch(*TBB, TBB_TBB, TBB_FBB, TrueCond)) {
+    return false;
+  }
+
+  MachineInstr *HeadCmpMI = findSuitableCompare(&HBB);
+  if (!HeadCmpMI) {
+    return false;
+  }
+
+  MachineInstr *TrueCmpMI = findSuitableCompare(TBB);
+  if (!TrueCmpMI) {
+    return false;
+  }
+
+  AArch64CC::CondCode HeadCmp;
+  if (HeadCond.empty() || !parseCond(HeadCond, HeadCmp)) {
+    return false;
+  }
+
+  AArch64CC::CondCode TrueCmp;
+  if (TrueCond.empty() || !parseCond(TrueCond, TrueCmp)) {
+    return false;
+  }
+
+  const int HeadImm = (int)HeadCmpMI->getOperand(2).getImm();
+  const int TrueImm = (int)TrueCmpMI->getOperand(2).getImm();
+
+  LLVM_DEBUG(dbgs() << "Head branch:\n");
+  LLVM_DEBUG(dbgs() << "\tcondition: " << AArch64CC::getCondCodeName(HeadCmp)
+                    << '\n');
+  LLVM_DEBUG(dbgs() << "\timmediate: " << HeadImm << '\n');
+
+  LLVM_DEBUG(dbgs() << "True branch:\n");
+  LLVM_DEBUG(dbgs() << "\tcondition: " << AArch64CC::getCondCodeName(TrueCmp)
+                    << '\n');
+  LLVM_DEBUG(dbgs() << "\timmediate: " << TrueImm << '\n');
+
+  if (((HeadCmp == AArch64CC::GT && TrueCmp == AArch64CC::LT) ||
+       (HeadCmp == AArch64CC::LT && TrueCmp == AArch64CC::GT)) &&
+      std::abs(TrueImm - HeadImm) == 2) {
+    // This branch transforms machine instructions that correspond to
+    //
+    // 1) (a > {TrueImm} && ...) || (a < {HeadImm} && ...)
+    // 2) (a < {TrueImm} && ...) || (a > {HeadImm} && ...)
+    //
+    // into
+    //
+    // 1) (a >= {NewImm} && ...) || (a <= {NewImm} && ...)
+    // 2) (a <= {NewImm} && ...) || (a >= {NewImm} && ...)
+
+    CmpInfo HeadCmpInfo = adjustCmp(HeadCmpMI, HeadCmp);
+    CmpInfo TrueCmpInfo = adjustCmp(TrueCmpMI, TrueCmp);
+    if (std::get<0>(HeadCmpInfo) == std::get<0>(TrueCmpInfo) &&
+        std::get<1>(HeadCmpInfo) == std::get<1>(TrueCmpInfo)) {
+      modifyCmp(HeadCmpMI, HeadCmpInfo);
+      modifyCmp(TrueCmpMI, TrueCmpInfo);
+      return true;
+    }
+  } else if (((HeadCmp == AArch64CC::GT && TrueCmp == AArch64CC::GT) ||
+              (HeadCmp == AArch64CC::LT && TrueCmp == AArch64CC::LT)) &&
+             std::abs(TrueImm - HeadImm) == 1) {
+    // This branch transforms machine instructions that correspond to
+    //
+    // 1) (a > {TrueImm} && ...) || (a > {HeadImm} && ...)
+    // 2) (a < {TrueImm} && ...) || (a < {HeadImm} && ...)
+    //
+    // into
+    //
+    // 1) (a <= {NewImm} && ...) || (a >  {NewImm} && ...)
+    // 2) (a <  {NewImm} && ...) || (a >= {NewImm} && ...)
+
+    // GT -> GE transformation increases immediate value, so picking the
+    // smaller one; LT -> LE decreases immediate value so invert the choice.
+    bool adjustHeadCond = (HeadImm < TrueImm);
+    if (HeadCmp == AArch64CC::LT) {
+      adjustHeadCond = !adjustHeadCond;
     }
 
-    AArch64CC::CondCode TrueCmp;
-    if (TrueCond.empty() || !parseCond(TrueCond, TrueCmp)) {
-      continue;
+    if (adjustHeadCond) {
+      return adjustTo(HeadCmpMI, HeadCmp, TrueCmpMI, TrueImm);
+    } else {
+      return adjustTo(TrueCmpMI, TrueCmp, HeadCmpMI, HeadImm);
     }
+  }
+  // Other transformation cases almost never occur due to generation of < or >
+  // comparisons instead of <= and >=.
 
-    const int HeadImm = (int)HeadCmpMI->getOperand(2).getImm();
-    const int TrueImm = (int)TrueCmpMI->getOperand(2).getImm();
+  return false;
+}
 
-    LLVM_DEBUG(dbgs() << "Head branch:\n");
-    LLVM_DEBUG(dbgs() << "\tcondition: " << AArch64CC::getCondCodeName(HeadCmp)
-                      << '\n');
-    LLVM_DEBUG(dbgs() << "\timmediate: " << HeadImm << '\n');
+bool AArch64ConditionOptimizer::runOnMachineFunction(MachineFunction &MF) {
+  LLVM_DEBUG(dbgs() << "********** AArch64 Conditional Compares **********\n"
+                    << "********** Function: " << MF.getName() << '\n');
+  if (skipFunction(MF.getFunction()))
+    return false;
 
-    LLVM_DEBUG(dbgs() << "True branch:\n");
-    LLVM_DEBUG(dbgs() << "\tcondition: " << AArch64CC::getCondCodeName(TrueCmp)
-                      << '\n');
-    LLVM_DEBUG(dbgs() << "\timmediate: " << TrueImm << '\n');
-
-    if (((HeadCmp == AArch64CC::GT && TrueCmp == AArch64CC::LT) ||
-         (HeadCmp == AArch64CC::LT && TrueCmp == AArch64CC::GT)) &&
-        std::abs(TrueImm - HeadImm) == 2) {
-      // This branch transforms machine instructions that correspond to
-      //
-      // 1) (a > {TrueImm} && ...) || (a < {HeadImm} && ...)
-      // 2) (a < {TrueImm} && ...) || (a > {HeadImm} && ...)
-      //
-      // into
-      //
-      // 1) (a >= {NewImm} && ...) || (a <= {NewImm} && ...)
-      // 2) (a <= {NewImm} && ...) || (a >= {NewImm} && ...)
-
-      CmpInfo HeadCmpInfo = adjustCmp(HeadCmpMI, HeadCmp);
-      CmpInfo TrueCmpInfo = adjustCmp(TrueCmpMI, TrueCmp);
-      if (std::get<0>(HeadCmpInfo) == std::get<0>(TrueCmpInfo) &&
-          std::get<1>(HeadCmpInfo) == std::get<1>(TrueCmpInfo)) {
-        modifyCmp(HeadCmpMI, HeadCmpInfo);
-        modifyCmp(TrueCmpMI, TrueCmpInfo);
-        Changed = true;
-      }
-    } else if (((HeadCmp == AArch64CC::GT && TrueCmp == AArch64CC::GT) ||
-                (HeadCmp == AArch64CC::LT && TrueCmp == AArch64CC::LT)) &&
-                std::abs(TrueImm - HeadImm) == 1) {
-      // This branch transforms machine instructions that correspond to
-      //
-      // 1) (a > {TrueImm} && ...) || (a > {HeadImm} && ...)
-      // 2) (a < {TrueImm} && ...) || (a < {HeadImm} && ...)
-      //
-      // into
-      //
-      // 1) (a <= {NewImm} && ...) || (a >  {NewImm} && ...)
-      // 2) (a <  {NewImm} && ...) || (a >= {NewImm} && ...)
-
-      // GT -> GE transformation increases immediate value, so picking the
-      // smaller one; LT -> LE decreases immediate value so invert the choice.
-      bool adjustHeadCond = (HeadImm < TrueImm);
-      if (HeadCmp == AArch64CC::LT) {
-          adjustHeadCond = !adjustHeadCond;
-      }
+  TII = MF.getSubtarget().getInstrInfo();
+  DomTree = &getAnalysis<MachineDominatorTreeWrapperPass>().getDomTree();
+  MRI = &MF.getRegInfo();
 
-      if (adjustHeadCond) {
-        Changed |= adjustTo(HeadCmpMI, HeadCmp, TrueCmpMI, TrueImm);
-      } else {
-        Changed |= adjustTo(TrueCmpMI, TrueCmp, HeadCmpMI, HeadImm);
-      }
-    }
-    // Other transformation cases almost never occur due to generation of < or >
-    // comparisons instead of <= and >=.
+  bool Changed = false;
+
+  // Visit blocks in dominator tree pre-order. The pre-order enables multiple
+  // cmp-conversions from the same head block.
+  // Note that updateDomTree() modifies the children of the DomTree node
+  // currently being visited. The df_iterator supports that; it doesn't look at
+  // child_begin() / child_end() until after a node has been visited.
+  for (MachineDomTreeNode *I : depth_first(DomTree)) {
+    MachineBasicBlock *HBB = I->getBlock();
+    Changed |= optimizeIntraBlock(*HBB);
+    Changed |= optimizeCrossBlock(*HBB);
   }
 
   return Changed;
diff --git a/llvm/test/CodeGen/AArch64/combine-comparisons-by-cse.ll b/llvm/test/CodeGen/AArch64/combine-comparisons-by-cse.ll
index 6449c3e11d667..4449c2b9193a4 100644
--- a/llvm/test/CodeGen/AArch64/combine-comparisons-by-cse.ll
+++ b/llvm/test/CodeGen/AArch64/combine-comparisons-by-cse.ll
@@ -7,6 +7,110 @@
 @c = external global i32
 @d = external global i32
 
+
+; Test intra-block CSINC optimization with (a > 10) and (a >= 10)
+; Two CSINC instructions should share a single CMP after optimization
+define void @intra_block_csinc(i32 %x, i32 %y, ptr %out1, ptr %out2) #0 {
+; CHECK-LABEL: intra_block_csinc:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    adrp x8, :got:a
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    cmp w8, #10
+; CHECK-NEXT:    csinc w8, w0, w1, gt
+; CHECK-NEXT:    csinc w9, w0, w1, ge
+; CHECK-NEXT:    str w8, [x2]
+; CHECK-NEXT:    str w9, [x3]
+; CHECK-NEXT:    ret
+entry:
+  %val = load i32, ptr @a, align 4
+
+  ; First: result1 = (a > 10) ? x : (y + 1)
+  %cond1 = icmp sgt i32 %val, 10
+  %y_inc1 = add i32 %y, 1
+  %result1 = select i1 %cond1, i32 %x, i32 %y_inc1
+  store i32 %result1, ptr %out1
+
+  ; Second: result2 = (a >= 10) ? x : (y + 1)
+  ; Canonicalizes to (a > 9), then optimizes to reuse first CMP with adjusted condition
+  %cond2 = icmp sge i32 %val, 10
+  %y_inc2 = add i32 %y, 1
+  %result2 = select i1 %cond2, i32 %x, i32 %y_inc2
+  store i32 %result2, ptr %out2
+
+  ret void
+}
+
+; Negative test: different registers should not be optimized
+define void @intra_block_csinc_different_regs(i32 %x, i32 %y, ptr %out1, ptr %out2) #0 {
+; CHECK-LABEL: intra_block_csinc_different_regs:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    adrp x8, :got:a
+; CHECK-NEXT:    adrp x9, :got:b
+; CHECK-NEXT:    ldr x8, [x8, :got_lo12:a]
+; CHECK-NEXT:    ldr x9, [x9, :got_lo12:b]
+; CHECK-NEXT:    ldr w8, [x8]
+; CHECK-NEXT:    ldr w9, [x9]
+; CHECK-NEXT:    cmp w8, #10
+; CHECK-NEXT:    csinc w8, w0, w1, gt
+; CHECK-NEXT:    cmp w9, #9
+; CHECK-NEXT:    str w8, [x2]
+; CHECK-NEXT:    csinc w8, w0, w1, gt
+; CHECK-NEXT:    str w8, [x3]
+; CHECK-NEXT:    ret
+entry:
+  %val1 = load i32, ptr @a, align 4
+  %val2 = load i32, ptr @b, align 4
+
+  ; First: result1 = (a > 10) ? x : (y + 1)
+  %cond1 = icmp sgt i32 %val1, 10
+  %y_inc1 = add i32 %y, 1
+  %result1 = select i1 %cond1, i32 %x, i32 %y_inc1
+  store i32 %result1, ptr %out1
+
+  ; Second: result2 = (b > 9) ? x : (y + 1) - compares DIFFERENT register
+  ; Should NOT optimize - need both CMPs
+  %cond2 = icmp sgt i32 %val2, 9
+  %y_inc2 ...
[truncated]

@hussam-alhassan
Copy link
Contributor Author

Apologies if you're not the right people to ping:

@nasherm @davemgreen

@nasherm nasherm self-requested a review December 29, 2025 09:31
Copy link
Contributor

@nasherm nasherm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch.

Do you have some performance numbers for this work?

@hussam-alhassan
Copy link
Contributor Author

hussam-alhassan commented Jan 7, 2026

Thank you for the review

I ran statistics on all llvm tests and the optimisation triggered only twice, the two new tests added here.

The current optimisation is quite narrow, but I plan on adding support for more conditionals and potentially relaxing some of the conservative checks (while maintaining correctness).

I can also run the llvm-test-suite or add the STATISTIC macro I used to this commit, if either helps.

@nasherm
Copy link
Contributor

nasherm commented Jan 8, 2026

I ran statistics on all llvm tests and the optimisation triggered only twice, the two new tests added here.

The current optimisation is quite narrow, but I plan on adding support for more conditionals and potentially relaxing some of the conservative checks (while maintaining correctness).

Yeah I figured that the narrowness of the patch might make it hard to benchmark.

Copy link
Contributor

@nasherm nasherm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hussam-alhassan
Copy link
Contributor Author

The workflows might need approval it seems

Actually, are maintainers automatically notified of workflow requests, or do I need to ping/send a message?

@nasherm
Copy link
Contributor

nasherm commented Jan 8, 2026

The workflows might need approval it seems

I can approve the workflows

Actually, are maintainers automatically notified of workflow requests, or do I need to ping/send a message?

This is new to me. It seems like I have to approve the workflows since this is one of your first commits and you don't have commit access

@hussam-alhassan
Copy link
Contributor Author

Sorry, I accidentally re-ran the workflows, they need approval again. It seems I can't merge either once the workflows do finish.

@nasherm
Copy link
Contributor

nasherm commented Jan 8, 2026

It seems I can't merge either once the workflows do finish.

Yes, you're first couple of commits will need someone to merge on your behalf.

https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

@hussam-alhassan
Copy link
Contributor Author

hussam-alhassan commented Jan 8, 2026

Ah yes. In which case, could you please merge on my behalf?

@nasherm nasherm merged commit 4f7da2f into llvm:main Jan 8, 2026
10 checks passed
@github-actions
Copy link

github-actions bot commented Jan 8, 2026

@hussam-alhassan Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@hussam-alhassan
Copy link
Contributor Author

Many thanks for the review!

@hussam-alhassan hussam-alhassan deleted the aarch64-intrablock-csinc branch January 8, 2026 19:11
kshitijvp pushed a commit to kshitijvp/llvm-project that referenced this pull request Jan 9, 2026
…izer (llvm#173734)

This patch extends the AArch64ConditionOptimizer pass to handle CSINC
instructions within a single basic block, complementing the existing
cross-block branch optimization.

The optimization finds two CMP+CSINC pairs comparing the same register
with immediates differing by 1, and adjusts one comparison to enable CSE
to eliminate the redundant CMP instruction.

Example transformation:
```
  cmp  w8, llvm#10
  csinc w9, w0, w1, gt     ; w9 = (w8 > 10) ? w0 : w1+1
  cmp  w8, llvm#9              ; Removed by CSE after adjustment
  csinc w10, w0, w1, gt    ; w10 = (w8 > 9) ? w0 : w1+1
```

After optimization:
```
  cmp  w8, llvm#10
  csinc w9, w0, w1, gt     ; w9 = (w8 > 10) ? w0 : w1+1
  csinc w10, w0, w1, ge    ; w10 = (w8 >= 10) ? w0 : w1+1
```

The existing cross-block logic has also been extracted into its own
method.

Any feedback on code quality and better practices is highly welcome.

Co-authored-by: Hussam Alhassan <hsm.link@proton.me>
Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
…izer (llvm#173734)

This patch extends the AArch64ConditionOptimizer pass to handle CSINC
instructions within a single basic block, complementing the existing
cross-block branch optimization.

The optimization finds two CMP+CSINC pairs comparing the same register
with immediates differing by 1, and adjusts one comparison to enable CSE
to eliminate the redundant CMP instruction.

Example transformation:
```
  cmp  w8, llvm#10
  csinc w9, w0, w1, gt     ; w9 = (w8 > 10) ? w0 : w1+1
  cmp  w8, llvm#9              ; Removed by CSE after adjustment
  csinc w10, w0, w1, gt    ; w10 = (w8 > 9) ? w0 : w1+1
```

After optimization:
```
  cmp  w8, llvm#10
  csinc w9, w0, w1, gt     ; w9 = (w8 > 10) ? w0 : w1+1
  csinc w10, w0, w1, ge    ; w10 = (w8 >= 10) ? w0 : w1+1
```

The existing cross-block logic has also been extracted into its own
method.

Any feedback on code quality and better practices is highly welcome.

Co-authored-by: Hussam Alhassan <hsm.link@proton.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants