Wired Memory Management System by robertmsale · Pull Request #348 · ml-explore/mlx-swift

robertmsale · 2026-01-31T23:44:47Z

Proposed changes

Implements option 3 for #347 by moving the wired memory coordinator into mlx‑swift and exposing a generic manager/policy/ticket API.

Adds WiredMemoryManager, WiredMemoryTicket, WiredMemoryPolicy, and WiredMemoryEvent.
Introduces reservation vs active tickets and hysteresis (threshold + cooldown) to avoid shrink thrash while work is active.
Uses policy id (Identifiable) for grouping.
Adds tests covering policy stacking, reservations, hysteresis, cooldown, and admission behavior.
Adds doc comments and new DocC article (wired-memory.md) plus README/MLX doc updates.

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

So devs can customize admission gating

robertmsale · 2026-02-01T01:10:23Z

Tested end-to-end with the changes in mlx-swift-lm most recent commits. So the pipeline from here to inference over there is effectively wired together with unit tests demonstrating the system, how to use it effectively, and preserving default behavior to some degree.

robertmsale · 2026-02-01T01:34:06Z

System overview

robertmsale · 2026-02-01T17:19:06Z

I added policy-only mode for CPU and future CUDA workflows (I see open issues about CUDA support with indications of inclusion) and made it so Apple Silicon devices in CPU-only mode can still use maximumRecommendedWorkingSetBytes as a reference cap 😄

davidkoski · 2026-02-02T23:05:31Z

Source/MLX/Documentation.docc/Articles/wired-memory.md

+MLX only provides the generic interfaces. MLXLMCommon (from mlx-swift-lm)
+provides LLM-focused policies such as `WiredSumPolicy`, `WiredMaxPolicy`, and
+`WiredFixedPolicy`. You can use `GPU.maxRecommendedWorkingSetBytes()` as a
+portable upper bound when designing custom policies.


I wonder if one or more of these should be in mlx-swift? WiredSumPolicy or WiredMaxPolicy for example might be generic enough and could be used in domains outside of llms. I do agree that the interesting policies will be domain specific.

davidkoski · 2026-02-02T23:07:40Z

Source/MLX/Documentation.docc/Articles/wired-memory.md

+try await ticket.withWiredLimit {
+    // run inference
+}
+```


Nice clear example!

davidkoski · 2026-02-02T23:10:09Z

Source/MLX/Documentation.docc/Articles/wired-memory.md

+
+// Reserve model weights without keeping the limit elevated while idle.
+let weights = policy.ticket(size: weightsBytes, kind: .reservation)
+_ = await weights.start()


Should there be a way to cancel this? E.g. you might have a server that would load and unload models (weights). It might make sense to cancel the policy rather than this reservation, or if this returned a Cancellable you could cancel the specific reservation.

I think if this was inside a Task you could cancel the task (I can see how that works), but this example looks like you might want to hold a ticket long term and wrapping it in a Task is unwieldy.

davidkoski · 2026-02-02T23:11:51Z

Source/MLX/Documentation.docc/Articles/wired-memory.md

+
+### Choosing a baseline
+
+When wired memory is unsupported, the manager will use:


This might be unclear -- if wired memory is unsupported what do these values actually do? I would think it would just be a NOP.

davidkoski · 2026-02-02T23:14:47Z

Source/MLX/WiredMemory.swift

+        let result = mlx_set_wired_limit(&previous, 0)
+        guard result == 0 else { return nil }
+        var tmp: size_t = 0
+        _ = mlx_set_wired_limit(&tmp, previous)


I am concerned that temporarily setting it to 0 may cause trouble. Can we avoid using this? What if the manager could answer this and we just document that use of multiple managers outside a testing context is ill-advised or undefined. The manager sets it so it should surely have this answer.

davidkoski · 2026-02-02T23:17:37Z

Source/MLX/WiredMemory.swift

+///
+/// These settings implement hysteresis to prevent small or frequent shrinks
+/// while active work is running. Growing the limit is always allowed; shrinking
+/// is gated by a minimum drop and a minimum time between changes.


Is this something you observed being a problem? If so, awesome -- it seems like it could be useful to keep active memory hot.

davidkoski · 2026-02-02T23:19:53Z

Source/MLX/WiredMemory.swift

+
+            if let lastLimitChange {
+                let elapsed = Date().timeIntervalSince(lastLimitChange)
+                if elapsed < configuration.shrinkCooldown {


I have one concern here: the way I read this is we won't shrink unless another request comes through here?

Consider:

request for N bytes of wired memory

request is complete, reduce back to 0 but per the timeout no

no more requests come in

Does the process still hold the wired memory? I think so, but perhaps there is a path I didn't see.

davidkoski · 2026-02-02T23:21:30Z

Source/MLX/WiredMemory.swift

+    /// wired memory control is unsupported (e.g. CPU-only execution). The
+    /// manager will not attempt to change wired memory, but tickets can still
+    /// gate admission and emit events.
+    public var policyOnlyWhenUnsupported: Bool


Is there a benefit to having the default be false? Or even having this config -- why not just always do this? It seems harmless (the compute cost is minimal even if the result is a NOP).

davidkoski · 2026-02-02T23:24:24Z

Source/MLX/WiredMemory.swift

+    /// Debug label for the policy group, if applicable.
+    public let policy: String?
+    /// Baseline wired limit captured from the system.
+    public let baseline: Int?


Maybe the term baseline needs an explicit definition? My concept is that this is the idle floor -- we don't go below this, even when there are no outstanding requests. But the unsupported case and in the code seem to tie this to the max supported value -- I am not sure what the intent is.

I think the baseline today (without this code) is 0.

davidkoski · 2026-02-02T23:25:55Z

Source/MLX/WiredMemory.swift

+
+    /// Stable grouping key for policies.
+    private enum PolicyKey: Hashable {
+        case identifier(AnyHashable)


Do we need this enum? Why not just use AnyHashable as the key? This is internal to the actor so if the implementation requires adding another level it is safe to do so without callers being aware.

davidkoski · 2026-02-02T23:29:51Z

Source/MLX/WiredMemory.swift

+    public func end(id: UUID, policy: any WiredMemoryPolicy) async -> Int {
+        if let waiter = waiters.removeValue(forKey: id) {
+            waiter.resume()
+        }


There is a call to resumeWaiters() at the end that will awaken all the waiters. Why have a one off here? (not saying it is wrong, but it isn't clear to me -- if we need it I think maybe it needs a comment)

davidkoski · 2026-02-02T23:32:28Z

Source/MLX/WiredMemory.swift

+            waiter.resume()
+        }
+
+        guard WiredMemoryBackend.isSupported || policyOnlyMode else {


I think isSupported will return false if this is a Task set for the cpu device. Two questions:

will returning a 0 here affect the wired memory for the GPU tasks? would a nil be better to indicate "no change"?

the return value is more informative -- policy is applied by applyCurrentLimit(). would the return from the end of the function make more sense? currentLimit ?? baseline ?? 0

is it ok that resumeWaiters() is not called (also consider this question if the one-off in a previous line isn't needed) -- I think maybe this is ok because the wired memory probably doesn't change here

davidkoski · 2026-02-02T23:35:10Z

Source/MLX/WiredMemory.swift

+        }
+
+        guard let state = tickets.removeValue(forKey: id) else {
+            emit(kind: .ticketEndIgnored, ticketID: id)


What is the case where this would happen? I wonder if this should be a fatalError -- is some invariant violated here (like we lost a ticket)?

I think at one point it was mentioned that end should be idempotent. Should it? Or is that papering over programming errors? free() isn't idempotent and is a similar resource release idea.

davidkoski · 2026-02-02T23:43:39Z

Source/MLX/WiredMemory.swift

+                continuation.onTermination = { _ in
+                    Task { await self.removeEventContinuation(id: id) }
+                }
+            }


I think this is safe -- the call to AsyncStream is synchronous so the interior happens before it returns. It surprised me because I was used to this form:

let (stream, continuation) = AsyncStream<Generation>.makeStream()

I don't have a strong preference - I think they are the same, just pointing out as I had to read to make sure I understood.

davidkoski · 2026-02-02T23:45:20Z

Source/MLX/WiredMemory.swift

+        var baselineValue = ensureBaseline(refresh: baseline == nil || !hasActiveWork())
+        if tickets[id] != nil {
+            emit(
+                kind: .ticketStartIgnored,


Similar question on the end -- a double start might be a programming error. Is there a use case for allowing this?

davidkoski · 2026-02-02T23:55:04Z

Source/MLX/WiredMemory.swift

+    ///
+    /// If this was the last ticket, the manager restores the baseline and
+    /// clears internal state.
+    public func end(id: UUID, policy: any WiredMemoryPolicy) async -> Int {


This is marked as async but I do not see any await inside. Callers (outside the actor) have to await anyway as that is what the actor requires, but this would also force callers inside the actor (if there were any) to 1) be async and 2) have a potentially a suspend point, which is requires a little more care.

I think this should remove the async.

davidkoski · 2026-02-02T23:57:57Z

Source/MLX/WiredMemory.swift

+        if WiredMemoryBackend.isSupported {
+            return WiredMemoryBackend.readCurrentLimit() ?? 0


Could we use currentLimit here? I know there is an issue if you have multiple managers but we can document that as undefined (it seems useless except for tests).

In fact, I wonder ... we could make the init() on the actor be non-public to enforce this. The tests could still create multiple. Or add an obvious path like static func newTestPolicy() that would be more noticeable if somebody tried multiple in a real program?

davidkoski · 2026-02-03T00:00:42Z

Source/MLX/WiredMemory.swift

+        return 0
+    }
+
+    private func ensureBaseline(refresh: Bool) -> Int {


This is not taking an action, it is computing the current baseline. ensure sounds like it is applying. I see it sits on top of resolve so we are already using that name :-). The key here is that it notices when the resolved baseline changes. resolveBaselineAndEmit? I am not great with the names (and it is private so ultimately just needed for clarity), see what you think.

davidkoski · 2026-02-03T00:07:50Z

README.md

 popd
 ```

+## Wired Memory Management


These docs are well written but I don't think we need them at the top level of the README. I wonder if this would be better in Source/MLX/Documentation.docc/MLX.md (the top level of the MLX docs where there are pointers to some of the articles with more information). This README is much higher level.

davidkoski · 2026-02-03T00:11:18Z

Source/MLX/WiredMemory.swift

+        size: Int,
+        policy: any WiredMemoryPolicy,
+        kind: WiredMemoryTicketKind
+    ) async -> Int {


Unlike the end this one must be async as it can suspend per the admit on the policy.

davidkoski · 2026-02-03T00:13:42Z

Source/MLX/Documentation.docc/Articles/wired-memory.md

@@ -0,0 +1,121 @@
+# Wired Memory Management
+
+Coordinate a process-wide wired memory limit for GPU workloads.


I think we need a few additions:

deprecate Memory.withWired and GPU.withWired

the sync version of that can be documented as being a NOP

the async version can use a static policy, e.g. WiredSumPolicy (document)

We don't want a back door that will let people bypass the mechanism.

davidkoski · 2026-02-03T00:14:28Z

Looks really good! Check out my comments & questions and see what you think.

- Added WiredSumPolicy and WiredMaxPolicy - Removed backend probing for current limit and use caching instead - Converted WiredMemoryManager to fully singleton with testability flags - Made policy-only mode enabled by default - Moved Wired Memory docs from README to correct location

robertmsale · 2026-02-03T17:40:56Z

Looks really good! Check out my comments & questions and see what you think.

Alright I carried out some of your recommendations:

WiredSumPolicy and WiredMaxPolicy are both included here
Deprecated synchronous Memory.withWired and GPU.withWired with documentation and default policy for async
Included clarification around baseline definition
Made policy-only an opt-out operation (kept policyOnlyWhenUnsupported for debugging and telemetry access, defaults to true)
Moved docs from README to appropriate docs location
Fixed Linux/CUDA CI build errors by gating GPU.maximumRecommendedWorkingSet() with metal compilation flag
Removed await requirement from Ticket.end(...)
WiredMemoryManager is a singleton and supports mid-flight reconfiguration when no tickets are active (also test fixture tooling)

There's just a few of items left:

PolicyKey: We can reduce this down to AnyHashable, since you asked if we need it I thought a follow-up would be appropriate :) didn't know if adding groupings later could be helpful or not, but I have a habit of adding more when less would be appropriate so if you think it's best to flatten the key I'll do it.
"Ticket start/end bookkeeping": This one you mentioned as being a programming error. I agree, but on the other hand I think it's a good safety guarantee for the reasons I mentioned in the review. Particularly with iOS apps entering the background while a Task is running (deinitializing reference-types that trigger the same events as task cancellations, something really hard to debug/test, often overlooked), or spurious tasks causing double start/end, where intentionally handling this might overcome future issues. My alternative would be to keep it, but have a debug assertion/preconditionFailure to capture whether a ticket ID was missing so devs can catch programming errors, while release mode keeps the same behavior to avoid crashing.

I learned a lot from this and appreciate your really thorough feedback!

davidkoski · 2026-02-04T21:45:43Z

Source/MLX/Memory.swift

+            manager: .shared,
+            kind: .active
+        )
+        return try await ticket.withWiredLimit(body)


Looks great!

davidkoski · 2026-02-04T22:00:11Z

PolicyKey: We can reduce this down to AnyHashable, since you asked if we need it I thought a follow-up would be appropriate :) didn't know if adding groupings later could be helpful or not, but I have a habit of adding more when less would be appropriate so if you think it's best to flatten the key I'll do it.

No worries, I usually go the opposite way -- less is more until you know you need it. In this case it is a private implementation detail and can be changed as needed in the future without clients knowing (meaning specifically that the details are not critical to it working). Let's leave it as is.

"Ticket start/end bookkeeping": This one you mentioned as being a programming error. I agree, but on the other hand I think it's a good safety guarantee for the reasons I mentioned in the review. Particularly with iOS apps entering the background while a Task is running (deinitializing reference-types that trigger the same events as task cancellations, something really hard to debug/test, often overlooked), or spurious tasks causing double start/end, where intentionally handling this might overcome future issues. My alternative would be to keep it, but have a debug assertion/preconditionFailure to capture whether a ticket ID was missing so devs can catch programming errors, while release mode keeps the same behavior to avoid crashing.

The swift policy (I claim, though now that I try to find it, can't) is for contract violations (programming errors) to typically be fatal. E.g. array indexing errors are fatal -- the developer messed something up.

You may be right that there are use cases where this could happen. Swift does provide CheckedContinuation and UnsafeContinuation potentially for a use case like this. I don't think we need to go that far, but looking at the possibilities:

https://www.swiftbysundell.com/articles/picking-the-right-way-of-failing-in-swift/

I wonder if assert() or assertionFailure() might be useful here? It would stop in the debugger / crash for debug builds and pass through to the messaging infrastructure you have here -- developers would see it, customers wouldn't. That might be a happy medium.

I learned a lot from this and appreciate your really thorough feedback!

Thanks! This work has required a bit more thinking than the average port of an LLM, that is for sure!

robertmsale · 2026-02-05T14:40:02Z

I wonder if assert() or assertionFailure() might be useful here?

Awesome! I agree this is the correct way to handle it, especially for the reason you mentioned (triggers a breakpoint in debug). I think if a dev never hits that breakpoint during integration testing or E2E, they very likely have no programming errors, and we still cover those sneaky edge cases! Assertions have been added. There's been a good amount of changes here since the first commit so my downstream PR in mlx-swift-lm is going to require some refactoring and testing, but I'm happy with how this one turned out 😁

davidkoski

I think this is ready to go, thank you for all the work you put in on this!

* Wired Memory Management System * Exposed `GPU.maxRecommendedWorkingSetBytes()` * Added policy-only mode for CPU-only support (and future CUDA) * Deprecated synchronous WiredMemory setting

Robert Sale added 2 commits January 31, 2026 15:28

Wired Memory Management System

1fbc11f

Exposed GPU.maxRecommendedWorkingSetBytes()

566f260

So devs can customize admission gating

robertmsale marked this pull request as ready for review February 1, 2026 01:04

Added policy-only mode for CPU-only support (and future CUDA)

d0950ee

davidkoski reviewed Feb 2, 2026

View reviewed changes

davidkoski reviewed Feb 3, 2026

View reviewed changes

Robert Sale added 2 commits February 3, 2026 07:50

Fixed Linux and CUDA platform guard

a62adf4

Deprecated synchronous WiredMemory setting

59562b7

davidkoski reviewed Feb 4, 2026

View reviewed changes

Source/MLX/Memory.swift

manager: .shared,

kind: .active

)

return try await ticket.withWiredLimit(body)

Copy link

Collaborator

davidkoski Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Added double start/end debug assertions

0fc891d

davidkoski approved these changes Feb 6, 2026

View reviewed changes

davidkoski merged commit 30d0b9b into ml-explore:main Feb 6, 2026
7 checks passed


		### Choosing a baseline

		When wired memory is unsupported, the manager will use:

		if WiredMemoryBackend.isSupported {
		return WiredMemoryBackend.readCurrentLimit() ?? 0

		@@ -0,0 +1,121 @@
		# Wired Memory Management

		Coordinate a process-wide wired memory limit for GPU workloads.

Conversation

robertmsale commented Jan 31, 2026

Proposed changes

Checklist

Uh oh!

robertmsale commented Feb 1, 2026

Uh oh!

robertmsale commented Feb 1, 2026

Uh oh!

robertmsale commented Feb 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkoski Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkoski Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkoski Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkoski commented Feb 3, 2026

Uh oh!

robertmsale commented Feb 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkoski commented Feb 4, 2026

Uh oh!

robertmsale commented Feb 5, 2026

Uh oh!

davidkoski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

davidkoski Feb 2, 2026 •

edited

Loading

davidkoski Feb 3, 2026 •

edited

Loading

davidkoski Feb 3, 2026 •

edited

Loading