Skip to content

Port of bitnet1.58 with custom metal kernel#331

Merged
davidkoski merged 13 commits intoml-explore:mainfrom
johnmai-dev:20250613-port-bitnet1.58
Jul 3, 2025
Merged

Port of bitnet1.58 with custom metal kernel#331
davidkoski merged 13 commits intoml-explore:mainfrom
johnmai-dev:20250613-port-bitnet1.58

Conversation

@johnmai-dev
Copy link
Contributor

@johnmai-dev johnmai-dev commented Jun 12, 2025

This PR ports bitnet1.58 contributed by @Blaizzy . Thanks to my idol @Blaizzy !
Source: ml-explore/mlx-lm#219

image

@Blaizzy
Copy link

Blaizzy commented Jun 12, 2025

Amazing job with the swift port @johnmai-dev! 🔥🚀

The quantization here is done normally like any other model. So you can consider it done ✅

@johnmai-dev
Copy link
Contributor Author

Amazing job with the swift port @johnmai-dev! 🔥🚀

The quantization here is done normally like any other model. So you can consider it done ✅

Thank you! ❤️
Yes, you are right. I still need to adjust some details and expect to finish later today.

# Conflicts:
#	Libraries/MLXLLM/LLMModelFactory.swift
@johnmai-dev
Copy link
Contributor Author

Hello,@Blaizzy

What is the difference between quantization and quantization_config in config.json? But there is only quantization_config in microsoft/bitnet-b1.58-2B-4T.

I see that apply_hf_quantization only uses quantization_config
https://github.com/ml-explore/mlx-lm/blob/4fab6fcbc9dd63dea229692f91028d33f7532fd6/mlx_lm/quant/utils.py#L56-L72

    "quantization": {
        "group_size": 64,
        "bits": 4,
        "quant_method": "bitnet",
        "linear_class": "autobitlinear",
        "quantization_mode": "offline"
    },
    "quantization_config": {
        "group_size": 64,
        "bits": 4,
        "quant_method": "bitnet",
        "linear_class": "autobitlinear",
        "quantization_mode": "offline"
    },

Looking forward to your reply, thanks ❤️

@johnmai-dev
Copy link
Contributor Author

Hello,@Blaizzy

What is the difference between quantization and quantization_config in config.json? But there is only quantization_config in microsoft/bitnet-b1.58-2B-4T.

I see that apply_hf_quantization only uses quantization_config https://github.com/ml-explore/mlx-lm/blob/4fab6fcbc9dd63dea229692f91028d33f7532fd6/mlx_lm/quant/utils.py#L56-L72

    "quantization": {
        "group_size": 64,
        "bits": 4,
        "quant_method": "bitnet",
        "linear_class": "autobitlinear",
        "quantization_mode": "offline"
    },
    "quantization_config": {
        "group_size": 64,
        "bits": 4,
        "quant_method": "bitnet",
        "linear_class": "autobitlinear",
        "quantization_mode": "offline"
    },

Looking forward to your reply, thanks ❤️

Currently, Quantization does not support decoding quant_method, linear_class, or quantization_mode.

public struct Quantization: Codable, Sendable, Equatable {
public init(groupSize: Int, bits: Int) {
self.groupSize = groupSize
self.bits = bits
}
public let groupSize: Int
public let bits: Int
public var asTuple: (Int, Int) { (groupSize, bits) }
enum CodingKeys: String, CodingKey {
case groupSize = "group_size"
case bits = "bits"
}
}

I am considering whether to adjust Quantization or add a new QuantizationConfig struct.

Failed: configurationDecodingError("config.json", "mlx-community/bitnet-b1.58-2B-4T-4bit", Swift.DecodingError.typeMismatch(Swift.Dictionary<Swift.String, Any>, Swift.DecodingError.Context(codingPath: [CodingKeys(stringValue: "quantization", intValue: nil), _DictionaryCodingKey(stringValue: "quant_method", intValue: nil)], debugDescription: "Expected to decode Dictionary<String, Any> but found a string instead.", underlyingError: nil)))

@johnmai-dev johnmai-dev marked this pull request as ready for review June 14, 2025 15:58
@johnmai-dev
Copy link
Contributor Author

#295

@davidkoski
Copy link
Collaborator

Cut a tag on mlx-swift for the relu squared: 0.25.5

@awni
Copy link
Member

awni commented Jun 25, 2025

What is the difference between quantization and quantization_config in config.json?

I can say a little about that. MLX originally added the quantization field to the config.json. But Hugging Face uses a field called quantization_config to understand metadata about the model (e.g. if it's a quant of another model to display in the UI). So we now add both in order to maintain back-compatibility and set the right field for Hugging Face. So for any MLX model they should be the same.

For non MLX models they will probably just use the quantization_config. And that may or may not be compatible with MLX depending on the quant format.

@johnmai-dev
Copy link
Contributor Author

What is the difference between quantization and quantization_config in config.json?

I can say a little about that. MLX originally added the quantization field to the config.json. But Hugging Face uses a field called quantization_config to understand metadata about the model (e.g. if it's a quant of another model to display in the UI). So we now add both in order to maintain back-compatibility and set the right field for Hugging Face. So for any MLX model they should be the same.

For non MLX models they will probably just use the quantization_config. And that may or may not be compatible with MLX depending on the quant format.

Thank you for your answer! ♥️

@johnmai-dev
Copy link
Contributor Author

Thank you very much!!! @angeloskath
Speed increased 2x!!! 🚀🚀🚀

image

@johnmai-dev johnmai-dev marked this pull request as ready for review June 27, 2025 14:45
@johnmai-dev
Copy link
Contributor Author

image

@johnmai-dev
Copy link
Contributor Author

I think it's ready to merge.
cc @davidkoski @awni

Copy link
Collaborator

@davidkoski davidkoski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I like the use of the custom kernel -- this will make a good example.

@davidkoski davidkoski merged commit 2a14634 into ml-explore:main Jul 3, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants