Skip to content

Add RoPE array-offset overload (prep for continuous batching)#305

Merged
davidkoski merged 8 commits intoml-explore:mainfrom
PicoMLX:main
Jan 15, 2026
Merged

Add RoPE array-offset overload (prep for continuous batching)#305
davidkoski merged 8 commits intoml-explore:mainfrom
PicoMLX:main

Conversation

@ronaldmannak
Copy link
Contributor

@ronaldmannak ronaldmannak commented Dec 1, 2025

Proposed changes

Add array offset support to MLXFast.RoPE to match the Python MLX API.

Dependencies

Motivation

The Python mlx.core.fast.rope function accepts offset as either an int or array.
This enables several important use cases:

  • Continuous batching: Different sequences in a batch at different positions
  • Speculative decoding: Verifying multiple candidate tokens at different positions in parallel
  • Sliding window attention: Processing long sequences in chunks starting at different offsets

Changes

  • Add MLXFast.RoPE(..., offset: MLXArray, ...) overload
  • Add MLXNN.RoPE.callAsFunction(_:offset:) overload
  • Add unit tests

Example

// Batch of 3 sequences at positions [50, 20, 0]
let offsets = MLXArray([50, 20, 0])
let result = MLXFast.RoPE(queries, dimensions: 64, traditional: false, base: 10000, scale: 1.0, offset: offsets)

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

@davidkoski
Copy link
Collaborator

Requires: https://github.com/ml-explore/mlx-c/pull/85 the underlying C binding and core support for array-offset RoPE must be merged first.

Ahh, missed that one. Likely Ronan will do that a different way as he has some automated tools for keeping the API in sync.

@ronaldmannak
Copy link
Contributor Author

@davidkoski Ah, that explains the lack of newlines in MLX-c :) Happy to wait until it's exposed using the automated tools. Is there any ETA?

@davidkoski
Copy link
Collaborator

Is there any ETA?

Not sure, but I can ping Ronan.

@ronaldmannak ronaldmannak changed the title Expose mlx_fast_rope_offset_array Add RoPE array-offset overload (prep for continuous batching) Jan 7, 2026
@ronaldmannak
Copy link
Contributor Author

@davidkoski I think this is ready to go.
Be sure to check the MLX and MLX-C dependencies. I believe there are some issues (unrelated to this PR) in newer commits.

@davidkoski
Copy link
Collaborator

@davidkoski I think this is ready to go. Be sure to check the MLX and MLX-C dependencies. I believe there are some issues (unrelated to this PR) in newer commits.

Yeah, I think this can probably fit in after #319, which should pick up the new mlx/mlx-c dependencies.

@ronaldmannak
Copy link
Contributor Author

Sounds good!

@davidkoski
Copy link
Collaborator

@ronaldmannak I think this can be rebased -- I just merged the v0.30.1 update. I forgot and already cut a tag but I can cut a new one once this is in.

@davidkoski
Copy link
Collaborator

davidkoski commented Jan 8, 2026

Also, take a look at:

public protocol OffsetLayer: Module {
    func callAsFunction(_ x: MLXArray, offset: Int) -> MLXArray
}

Should this have an array offset method? Will it apply to all of the RoPE variants in mlx-swift-lm? (See ml-explore/mlx-swift-lm#29)

@ronaldmannak
Copy link
Contributor Author

Hi David, sorry for the slow reply. I had to spend some time following the thread through the code :)

It looks like OffsetLayer came in via mlx-swift #322 and is now used quite a bit in mlx-swift-lm #29. Since my PR (#305) adds RoPE support for offset: MLXArray, I agree we now have a small API mismatch: the protocol only models offset: Int.

Possible ways forward:

  • Add an MLXArray-offset overload to OffsetLayer:
    func callAsFunction(_ x: MLXArray, offset: MLXArray) -> MLXArray
  • Or introduce a separate protocol (e.g. ArrayOffsetLayer) for layers that support per-sequence offsets, keeping OffsetLayer as-is.

Either way, to minimize churn in the big mlx-swift-lm #29 change set, I’m inclined to update this PR after #29 has merged (unless you’d prefer to align it sooner). What direction do you prefer, and should this apply across all RoPE variants in mlx-swift-lm?

@davidkoski
Copy link
Collaborator

Or introduce a separate protocol (e.g. ArrayOffsetLayer) for layers that support per-sequence offsets, keeping OffsetLayer as-is.

I was thinking about that too -- I think maybe a separate protocol is a better idea. That way we don't force all layers to implement it (or end up in a situation where they can't).

Timing-wise, that sounds good. I hope to get all that merged early next week.

@ronaldmannak
Copy link
Contributor Author

Early next week sounds good. I’ll wait for the changes to land.

I’m not sure whether or not every implementation could or couldn't realistically support the offset: MLXArray overload, so introducing a separate protocol (for layers that can handle per-sequence offsets) seems like the safest option indeed.

@ronaldmannak
Copy link
Contributor Author

@davidkoski I've added the protocol. I think it's good to go

@davidkoski
Copy link
Collaborator

hit a swift-format item

@ronaldmannak
Copy link
Contributor Author

@davidkoski yep sorry, fixed.

Copy link
Collaborator

@davidkoski davidkoski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

@davidkoski davidkoski merged commit 0bb133a into ml-explore:main Jan 15, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants