Skip to content

Row Order model definitions#9115

Closed
dhiltgen wants to merge 1 commit intoollama:mainfrom
dhiltgen:row_order
Closed

Row Order model definitions#9115
dhiltgen wants to merge 1 commit intoollama:mainfrom
dhiltgen:row_order

Conversation

@dhiltgen
Copy link
Collaborator

@dhiltgen dhiltgen commented Feb 14, 2025

Replaces #8731 on main.

This change switches the model API (and backend) to be row-order to make it easier to port model definitions from other frameworks that use row-order patterns. I've made the following changes to the Backend API interface definitions:

  • All APIs assume row-order instead of column order
  • View no longer interleaves shape and stride - passed as two discrete int arrays

This requires a number of notable changes in the GGML backend:

  • Dimensions and shapes exposed in the API are reversed in the underlying GGML tensor to ensure operations work properly
  • Number of dimensions tracked in wrapper tensor type. GGML treats trailing dimensions of 1 as no-ops, so in order to retain the correct number of dimensions if the leading dimension has a shape of 1 (thus reversed to be the trailing dimension), this tracking is used instead of the underlying GGML reported number of dimensions.
  • Permute revamped to be consistent with other row-order APIs (pytorch, etc.). GGML treats the shape as the "destination" on where to move to. Other APIs (and ours with this change) treat the shape as the "source" on where to get the data from.
  • Reshape updated to support a -1 as a dimension consistent with other APIs, where the value will be calculated and filled in automatically.

Other potential refinements that aren't currently included but which may make sense:

  • Soften the "must have 4 dimensions" parameters to routines to be more consistent with other APIs and only require to match the actual number of dimensions in the tensor
  • Switch to Matmul pattern

The cache also required some adjustments based on these changes.

@dhiltgen dhiltgen force-pushed the row_order branch 10 times, most recently from 46effa1 to bfed2c4 Compare March 3, 2025 22:50
@dhiltgen dhiltgen force-pushed the row_order branch 8 times, most recently from 89f3ea2 to 02e2ab6 Compare March 12, 2025 22:42
@dhiltgen dhiltgen marked this pull request as ready for review March 12, 2025 22:50
@dhiltgen dhiltgen changed the title Draft: Row Order model definitions Row Order model definitions Mar 12, 2025
@dhiltgen dhiltgen force-pushed the row_order branch 2 times, most recently from 7b3c313 to 1296b39 Compare March 20, 2025 15:26
}

maskTensor, err := ctx.Input().FromFloatSlice(mask, length, batchSize)
maskTensor, err := ctx.Input().FromFloatSlice(mask, batchSize, length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an issue because we are swapping the order of dimensions here but the actual mask data is still laid out in the original order:
https://github.com/dhiltgen/ollama/blob/1296b3999ec5d4c15f32f5ac8311da94cdb808c4/kvcache/causal.go#L243

This works for GGML because the mask is in its native format and we just swap the arguments of FromFloatSlice back. However, it's probably at least part of the cache drift issue in MLX since the mask is not actually row order.

Most of the other inputs (which are the ones that aren't in the backend's native format) are only a single dimension, so the order doesn't make a difference. However, the mask is 2D.

I think we should change the mask generation to be row order native and in GGML do a permute in FromFloatSlice and FromIntSlice for multidimensional tensors. For the mask specifically, we may not actually need a contiguous, which would make it fast, though that is probably not generically true for all inputs.

@dhiltgen dhiltgen force-pushed the row_order branch 5 times, most recently from 7fe73e1 to 909b23a Compare March 27, 2025 17:47
@dhiltgen dhiltgen marked this pull request as draft March 27, 2025 17:48
@dhiltgen
Copy link
Collaborator Author

Moving back to draft status.

Matmul has replaced Mulmat, and now conforms to the behavior of pytorch matmul, however the current implementation has a significant performance hit. Once I can get it back to comparable performance, I'll take it back out of draft.

@dhiltgen dhiltgen force-pushed the row_order branch 2 times, most recently from af75cd7 to e809a68 Compare April 1, 2025 00:08
This change switches the model API (and backend) to be row-order to make it
easier to port model definitions from other frameworks that use row-order
patterns.
@reneleonhardt
Copy link

@dhiltgen Could other Ollama engineers help? 🙂

@TomLucidor
Copy link

Considers how this is blocking the MLX code, please move this forward soon

@dhiltgen
Copy link
Collaborator Author

Obsoleted by the new MLX based engine.

@dhiltgen dhiltgen closed this Feb 24, 2026
@TomLucidor
Copy link

@dhiltgen where is this MLX-based engine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants