[Perf] UnifiedRadixCache LRU optimization proposals

### Checklist

- [ ] If this is not a feature request but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [ ] Please use English. Otherwise, it will be closed.

### Motivation

## Summary

| # | Proposal | Worst-case (current) | Worst-case (proposed) |
|---|---|---|---|
| 1 | **O(1) LRU scan resume via per-node insertion counter** — bump a counter on every MRU insert; eviction maintains a cursor and resumes after each eviction (including cascade) instead of restarting from `tail`. | **O(M × K)** — M evictions each re-walk the same K skip nodes (locked nodes / non-leaf ancestors) | **O(M + K)** — each skip node visited at most once across the whole eviction batch |
| 2 | **Root-MRU chain ordering globally** — flip `reset_node_and_parents_mru` so the root-side ancestor lands closest to `head` and the leaf closest to `tail` within the touched chain. Applies to FULL, Mamba, and SWA uniformly. | **O(M × N × D)** — N recently-touched chains of depth D force (D−1) non-leaf skips per chain on every restart | **O(N × D)** — non-leaf ancestors skipped exactly once during a single resume walk |

The two proposals are complementary: Proposal 1 eliminates restart cost; Proposal 2 ensures the resume walk visits each chain's leaf before its ancestors.

---
## Proposal 1: O(1) LRU scan resume via per-node insertion counter
### Statement
`UnifiedRadixCache` eviction restarts its LRU scan from `tail` after every leaf eviction. When the path between `tail` and the next evictable leaf contains many "skip" nodes (locked nodes or non-leaf ancestors left over from cascade tombstone cleanup), each restart re-walks the same prefix of the list, producing **O(M × K)** total scan cost for M evictions over K average skips. We propose adding a monotonically increasing **insertion counter** to each node (bumped on every `insert_mru` / `reset_node_mru`) so the scan can resume at the correct position in **O(1)** after each eviction (including cascade), reducing total cost to **O(M + K)**.
### Where the restart happens (code pointers)
**`FullComponent.drive_eviction`** (`full_component.py:58-70`) — restarts on **every** iteration:
```python
def drive_eviction(self, params, tracker):
    request = params.num_tokens
    lru = self.cache.lru_lists[self.component_type]
    while tracker[self.component_type] < request:
        x = lru.get_leaf_lru_no_lock()       # ← starts from tail every loop
        if x is None:
            break
        self.cache._evict_component_and_detach_lru(x, self, is_leaf=True, tracker=tracker)
        self.cache._cascade_evict(x, self, tracker)
```
**`SWAComponent.drive_eviction`** (`swa_component.py:208-230`) — restarts on every **leaf** eviction:
```python
        if len(x.children) > 0:
            x_next = lru.get_prev_no_lock(x)              # internal: cursor-style continue
            self.cache._evict_component_and_detach_lru(x, self, is_leaf=False, tracker=tracker)
            self.cache._cascade_evict(x, self, tracker)
            x = x_next
        else:
            self.cache._evict_component_and_detach_lru(x, self, is_leaf=True, tracker=tracker)
            self.cache._cascade_evict(x, self, tracker)
            x = lru.get_lru_no_lock()                     # ← leaf: restart from tail
```
**`MambaComponent.drive_eviction`** (`mamba_component.py:137-159`) — same pattern as SWA, also restarts after leaf eviction.
**`UnifiedLRUList.get_leaf_lru_no_lock` / `get_lru_no_lock`** (`unified_radix_cache.py:151-155`) — both anchor on `self.tail`:
```python
def get_lru_no_lock(self):
    return self.get_prev_no_lock(self.tail, check_id=False)        # ← anchored at tail
def get_leaf_lru_no_lock(self):
    return self.get_prev_leaf_no_lock(self.tail, check_id=False)   # ← anchored at tail
```
**`UnifiedLRUList.get_prev_leaf_no_lock`** (`unified_radix_cache.py:140-149`) — the inner walk that pays the per-restart cost; every restart re-skips the same locked / non-leaf nodes:
```python
def get_prev_leaf_no_lock(self, node, check_id=True):
    ...
    x = node.lru_prev[ct]
    while x.component_data[ct].lock_ref > 0 or len(x.children) > 0:
        x = x.lru_prev[ct]                                # ← re-walked from tail every iter
    if x == self.head:
        return None
    return x
```
The fundamental issue: `drive_eviction` discards the scan position after each eviction, so the next iteration must rediscover it from scratch via another `tail`-anchored walk through the same skip nodes.
### Worst-case scenario
```
LRU layout (skip nodes interleaved with evictable leaves):
head ┄ S₁ ┄ L₁ ┄ S₂ ┄ L₂ ┄ S₃ ┄ L₃ ┄ ... ┄ Sₖ ┄ Lₖ ┄ tail
       │    │    │    │    │    │           │    │
    skip  evict skip evict skip evict     skip evict
Eviction trace under current restart-from-tail:
  iter 1: tail ─►Lₖ                                 [0 skips]
  iter 2: tail ─►Sₖ─►L(k-1)                         [1 skip ]
  iter 3: tail ─►Sₖ─►S(k-1)─►L(k-2)                 [2 skips]
  ...
  iter k: tail ─►Sₖ─►...─►S₂─►L₁                    [k-1 skips]
  Total = 1 + 2 + ... + k = O(k²)
```
### Proposed change
1. Add `lru_insert_seq: int` to `UnifiedTreeNode`. Bump a per-`UnifiedLRUList` global counter inside `_add_node` / `_add_node_after` and assign it to the node.
2. Maintain a per-eviction-call cursor inside `drive_eviction`; replace the unconditional `lru.get_leaf_lru_no_lock()` / `lru.get_lru_no_lock()` restart with a resume from the cursor, walking only the *remaining* portion toward head.
3. On cascade, when `_iteratively_delete_tombstone_leaf` deletes ancestor `P`:
   - If `P.lru_insert_seq < cursor.lru_insert_seq` → P was behind the cursor; the deletion just updates list pointers, no extra scan needed.
   - If `P.lru_insert_seq ≥ cursor.lru_insert_seq` → P was ahead of the cursor; either evict opportunistically during cascade or advance the cursor past P's neighbors.
---
## Proposal 2: Root-MRU chain ordering globally
### Statement
`reset_node_and_parents_mru` currently uses **leaf-MRU** ordering: when a chain `root → A → A1 → A11` is touched, `A11` lands closest to `head` and `A` closest to `tail` within the chain. This forces leaf-eviction scans (FULL) to walk past every chain's non-leaf ancestors stacked between the leaf and tail. We propose flipping the ordering to **root-MRU** for all components uniformly. FULL benefits the most from cheaper leaf scans; Mamba and SWA gain the same scan-cost benefit while remaining functionally correct (their `drive_eviction` already walks any node in pure LRU order).
### Worst-case scenario
```
Current leaf-MRU layout (FULL leaf-only scan from tail):
head ◄── A11 ◄── A1 ◄── A ◄── B11 ◄── B1 ◄── B ◄── C11 ◄── C1 ◄── C ◄── ...older... ◄── tail
         leaf    │      │     leaf    │      │     leaf    │      │
                 nonleaf nonleaf      nonleaf nonleaf      nonleaf nonleaf
                  ▲       ▲            ▲       ▲            ▲       ▲
                  └───────┴──── must skip these to find each leaf ──┘
Every restart-from-tail re-walks each chain's (D-1) non-leaf ancestors.
Proposed root-MRU layout for the same chains:
head ◄── A ◄── A1 ◄── A11 ◄── B ◄── B1 ◄── B11 ◄── C ◄── C1 ◄── C11 ◄── ...older... ◄── tail
         │     │      leaf    │     │      leaf    │     │      leaf
         nonleaf nonleaf      nonleaf nonleaf      nonleaf nonleaf
Scan from tail (with Proposal 1's resume cursor):
  ── first hit is C11 (leaf — done)
  ── resume past C11 → C1 (skip) → C (skip) → B11 (leaf — done)
  ── resume past B11 → B1 (skip) → B (skip) → A11 (leaf — done)
Every non-leaf ancestor skipped exactly ONCE total.
```
### Proposed change
Replace the body of `reset_node_and_parents_mru`:
```python
def reset_node_and_parents_mru(self, node, root_node, should_include):
    chain = []
    while node != root_node:
        if should_include(node):
            chain.append(node)
        node = node.parent
    # chain = [leaf, parent, grandparent, ..., root_child]
    # Insert root-side first so leaf lands closest to tail within the chain
    prev = self.head
    for n in reversed(chain):
        self._remove_node(n)
        self._add_node_after(prev, n)
        prev = n
```
Also flip the `last_access_time` direction in `_match_post_processor` (`unified_radix_cache.py:549-553`) for consistency — root gets the newest stamp, leaf gets the oldest within the chain. This field is currently only used by `__lt__`, so the change is cosmetic but removes a confusing inconsistency.

### Related resources

_No response_

#	Proposal	Worst-case (current)	Worst-case (proposed)
1	O(1) LRU scan resume via per-node insertion counter — bump a counter on every MRU insert; eviction maintains a cursor and resumes after each eviction (including cascade) instead of restarting from `tail`.	O(M × K) — M evictions each re-walk the same K skip nodes (locked nodes / non-leaf ancestors)	O(M + K) — each skip node visited at most once across the whole eviction batch
2	Root-MRU chain ordering globally — flip `reset_node_and_parents_mru` so the root-side ancestor lands closest to `head` and the leaf closest to `tail` within the touched chain. Applies to FULL, Mamba, and SWA uniformly.	O(M × N × D) — N recently-touched chains of depth D force (D−1) non-leaf skips per chain on every restart	O(N × D) — non-leaf ancestors skipped exactly once during a single resume walk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] UnifiedRadixCache LRU optimization proposals #24072

Checklist

Motivation

Summary

Proposal 1: O(1) LRU scan resume via per-node insertion counter

Statement

Where the restart happens (code pointers)

Worst-case scenario

Proposed change

Proposal 2: Root-MRU chain ordering globally

Statement

Worst-case scenario

Proposed change

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Perf] UnifiedRadixCache LRU optimization proposals #24072

Description

Checklist

Motivation

Summary

Proposal 1: O(1) LRU scan resume via per-node insertion counter

Statement

Where the restart happens (code pointers)

Worst-case scenario

Proposed change

Proposal 2: Root-MRU chain ordering globally

Statement

Worst-case scenario

Proposed change

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions