Skip to content

[Perf] UnifiedRadixCache LRU optimization proposals #24072

@Jialin

Description

@Jialin

Checklist

Motivation

Summary

# Proposal Worst-case (current) Worst-case (proposed)
1 O(1) LRU scan resume via per-node insertion counter — bump a counter on every MRU insert; eviction maintains a cursor and resumes after each eviction (including cascade) instead of restarting from tail. O(M × K) — M evictions each re-walk the same K skip nodes (locked nodes / non-leaf ancestors) O(M + K) — each skip node visited at most once across the whole eviction batch
2 Root-MRU chain ordering globally — flip reset_node_and_parents_mru so the root-side ancestor lands closest to head and the leaf closest to tail within the touched chain. Applies to FULL, Mamba, and SWA uniformly. O(M × N × D) — N recently-touched chains of depth D force (D−1) non-leaf skips per chain on every restart O(N × D) — non-leaf ancestors skipped exactly once during a single resume walk

The two proposals are complementary: Proposal 1 eliminates restart cost; Proposal 2 ensures the resume walk visits each chain's leaf before its ancestors.


Proposal 1: O(1) LRU scan resume via per-node insertion counter

Statement

UnifiedRadixCache eviction restarts its LRU scan from tail after every leaf eviction. When the path between tail and the next evictable leaf contains many "skip" nodes (locked nodes or non-leaf ancestors left over from cascade tombstone cleanup), each restart re-walks the same prefix of the list, producing O(M × K) total scan cost for M evictions over K average skips. We propose adding a monotonically increasing insertion counter to each node (bumped on every insert_mru / reset_node_mru) so the scan can resume at the correct position in O(1) after each eviction (including cascade), reducing total cost to O(M + K).

Where the restart happens (code pointers)

FullComponent.drive_eviction (full_component.py:58-70) — restarts on every iteration:

def drive_eviction(self, params, tracker):
    request = params.num_tokens
    lru = self.cache.lru_lists[self.component_type]
    while tracker[self.component_type] < request:
        x = lru.get_leaf_lru_no_lock()       # ← starts from tail every loop
        if x is None:
            break
        self.cache._evict_component_and_detach_lru(x, self, is_leaf=True, tracker=tracker)
        self.cache._cascade_evict(x, self, tracker)

SWAComponent.drive_eviction (swa_component.py:208-230) — restarts on every leaf eviction:

        if len(x.children) > 0:
            x_next = lru.get_prev_no_lock(x)              # internal: cursor-style continue
            self.cache._evict_component_and_detach_lru(x, self, is_leaf=False, tracker=tracker)
            self.cache._cascade_evict(x, self, tracker)
            x = x_next
        else:
            self.cache._evict_component_and_detach_lru(x, self, is_leaf=True, tracker=tracker)
            self.cache._cascade_evict(x, self, tracker)
            x = lru.get_lru_no_lock()                     # ← leaf: restart from tail

MambaComponent.drive_eviction (mamba_component.py:137-159) — same pattern as SWA, also restarts after leaf eviction.
UnifiedLRUList.get_leaf_lru_no_lock / get_lru_no_lock (unified_radix_cache.py:151-155) — both anchor on self.tail:

def get_lru_no_lock(self):
    return self.get_prev_no_lock(self.tail, check_id=False)        # ← anchored at tail
def get_leaf_lru_no_lock(self):
    return self.get_prev_leaf_no_lock(self.tail, check_id=False)   # ← anchored at tail

UnifiedLRUList.get_prev_leaf_no_lock (unified_radix_cache.py:140-149) — the inner walk that pays the per-restart cost; every restart re-skips the same locked / non-leaf nodes:

def get_prev_leaf_no_lock(self, node, check_id=True):
    ...
    x = node.lru_prev[ct]
    while x.component_data[ct].lock_ref > 0 or len(x.children) > 0:
        x = x.lru_prev[ct]                                # ← re-walked from tail every iter
    if x == self.head:
        return None
    return x

The fundamental issue: drive_eviction discards the scan position after each eviction, so the next iteration must rediscover it from scratch via another tail-anchored walk through the same skip nodes.

Worst-case scenario

LRU layout (skip nodes interleaved with evictable leaves):
head ┄ S₁ ┄ L₁ ┄ S₂ ┄ L₂ ┄ S₃ ┄ L₃ ┄ ... ┄ Sₖ ┄ Lₖ ┄ tail
       │    │    │    │    │    │           │    │
    skip  evict skip evict skip evict     skip evict
Eviction trace under current restart-from-tail:
  iter 1: tail ─►Lₖ                                 [0 skips]
  iter 2: tail ─►Sₖ─►L(k-1)                         [1 skip ]
  iter 3: tail ─►Sₖ─►S(k-1)─►L(k-2)                 [2 skips]
  ...
  iter k: tail ─►Sₖ─►...─►S₂─►L₁                    [k-1 skips]
  Total = 1 + 2 + ... + k = O(k²)

Proposed change

  1. Add lru_insert_seq: int to UnifiedTreeNode. Bump a per-UnifiedLRUList global counter inside _add_node / _add_node_after and assign it to the node.
  2. Maintain a per-eviction-call cursor inside drive_eviction; replace the unconditional lru.get_leaf_lru_no_lock() / lru.get_lru_no_lock() restart with a resume from the cursor, walking only the remaining portion toward head.
  3. On cascade, when _iteratively_delete_tombstone_leaf deletes ancestor P:
    • If P.lru_insert_seq < cursor.lru_insert_seq → P was behind the cursor; the deletion just updates list pointers, no extra scan needed.
    • If P.lru_insert_seq ≥ cursor.lru_insert_seq → P was ahead of the cursor; either evict opportunistically during cascade or advance the cursor past P's neighbors.

Proposal 2: Root-MRU chain ordering globally

Statement

reset_node_and_parents_mru currently uses leaf-MRU ordering: when a chain root → A → A1 → A11 is touched, A11 lands closest to head and A closest to tail within the chain. This forces leaf-eviction scans (FULL) to walk past every chain's non-leaf ancestors stacked between the leaf and tail. We propose flipping the ordering to root-MRU for all components uniformly. FULL benefits the most from cheaper leaf scans; Mamba and SWA gain the same scan-cost benefit while remaining functionally correct (their drive_eviction already walks any node in pure LRU order).

Worst-case scenario

Current leaf-MRU layout (FULL leaf-only scan from tail):
head ◄── A11 ◄── A1 ◄── A ◄── B11 ◄── B1 ◄── B ◄── C11 ◄── C1 ◄── C ◄── ...older... ◄── tail
         leaf    │      │     leaf    │      │     leaf    │      │
                 nonleaf nonleaf      nonleaf nonleaf      nonleaf nonleaf
                  ▲       ▲            ▲       ▲            ▲       ▲
                  └───────┴──── must skip these to find each leaf ──┘
Every restart-from-tail re-walks each chain's (D-1) non-leaf ancestors.
Proposed root-MRU layout for the same chains:
head ◄── A ◄── A1 ◄── A11 ◄── B ◄── B1 ◄── B11 ◄── C ◄── C1 ◄── C11 ◄── ...older... ◄── tail
         │     │      leaf    │     │      leaf    │     │      leaf
         nonleaf nonleaf      nonleaf nonleaf      nonleaf nonleaf
Scan from tail (with Proposal 1's resume cursor):
  ── first hit is C11 (leaf — done)
  ── resume past C11 → C1 (skip) → C (skip) → B11 (leaf — done)
  ── resume past B11 → B1 (skip) → B (skip) → A11 (leaf — done)
Every non-leaf ancestor skipped exactly ONCE total.

Proposed change

Replace the body of reset_node_and_parents_mru:

def reset_node_and_parents_mru(self, node, root_node, should_include):
    chain = []
    while node != root_node:
        if should_include(node):
            chain.append(node)
        node = node.parent
    # chain = [leaf, parent, grandparent, ..., root_child]
    # Insert root-side first so leaf lands closest to tail within the chain
    prev = self.head
    for n in reversed(chain):
        self._remove_node(n)
        self._add_node_after(prev, n)
        prev = n

Also flip the last_access_time direction in _match_post_processor (unified_radix_cache.py:549-553) for consistency — root gets the newest stamp, leaf gets the oldest within the chain. This field is currently only used by __lt__, so the change is cosmetic but removes a confusing inconsistency.

Related resources

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions