Conversation
| dims: headDim, base: config.ropeTheta, traditional: false, | ||
| scalingConfig: config.ropeScaling, | ||
| maxPositionEmbeddings: config.maxPositionEmbeddings) | ||
| } |
There was a problem hiding this comment.
Picking up changes post initial port: ml-explore/mlx-lm@714157b...main
| return suScaledRope(x, offset: offset) | ||
| } | ||
| return x | ||
| } |
There was a problem hiding this comment.
| } else { | ||
| let (cachedKeys, cachedValues) = cache.update(keys: keys, values: values) | ||
| // TODO dkoski | ||
| // print("\(cachedKeys.shape) \(cachedValues.shape) \(queries.shape), \(mask.masks?[0].shape ?? [])") |
There was a problem hiding this comment.
WIP debug stuff :-)
| _ action: @Sendable (isolated ModelContainer) async throws -> sending R | ||
| ) async rethrows -> sending R { | ||
| try await action(self) | ||
| } |
There was a problem hiding this comment.
@DePasqualeOrg FYI, trying some different things out re your recent cleanups around Sendable and thread safety. I have some tests that repro some threading issues (based on the LLMBasic example I made).
| import XCTest | ||
|
|
||
| /// Tests for the streamlined API using real models | ||
| public class ChatSessionTests: XCTestCase { |
There was a problem hiding this comment.
@DePasqualeOrg FYI moved this into an IntegrationTests directory -- I am not sure this should run on CI as these are rather large, but I think the tests are valuable to run locally.
There was a problem hiding this comment.
That makes sense. I thought about that when I modified this test, but I didn't realize that it could be excluded from CI.
| let result = try await session.respond(to: "What is 2+2? Reply with just the number.") | ||
| print("One-shot result:", result) | ||
| XCTAssertTrue(result.contains("4") || result.lowercased().contains("four")) | ||
| func testChatSessionAsyncInterrupt() async throws { |
There was a problem hiding this comment.
@DePasqualeOrg FYI an example of some concurrency issues related to the issues you were working on.
This triggers a variety of crashes:
- thread safety -- hold lock while calling stream sync mlx-swift#323
- [BUG] gemma3text crashes if the attention mask is used #27
and a couple others without issues where the streaming response is still running for a short time after the loop terminates early and we are doing concurrent modification of the KVCache.
I will use this to test actual fixes.
| Self.llmContainer, instructions: "You are a helpful assistant. Keep responses brief.") | ||
| @MainActor | ||
| func testViewModel() async throws { | ||
| let model = ChatModel(model: model()) |
There was a problem hiding this comment.
And this one simulates the activity from LLMBasic which also causes thread safety issues.
1d94ca6 to
9063912
Compare
- ml-explore/mlx-swift-examples#454 - fixes #27 - move ChatSession integration tests into new test target so we can more easily control when it runs - make a ChatSession _unit_ (more or less) test - fix Sendable / thread safety issues uncovered by LLMBasic - collect TestTokenizer and friends in its own file. fix warnings in tests - UserInputProcessors -> structs
77b0b54 to
1b62ad0
Compare
|
I think this work is complete - I am going to split it into a couple PRs with a little more focus |
add a minimal LLM chat example + switch to mlx-swift 0.30.2 mlx-swift-examples#454
fixes [BUG] gemma3text crashes if the attention mask is used #27
move ChatSession integration tests into new test target so we can more easily control when it runs
make a ChatSession unit (more or less) test
fix Sendable / thread safety issues uncovered by LLMBasic
Note that this requires changes in mlx-swift (so likely a new tag there):
Proposed changes
Please include a description of the problem or feature this PR is addressing. If there is a corresponding issue, include the issue #.
Checklist
Put an
xin the boxes that apply.pre-commit run --all-filesto format my code / installed pre-commit prior to committing changes