Porting TurboQuant to Windows (MSVC): Compatibility fixes and Python IDs

Hi Tom! 

First of all, thank you for the amazing work on TurboQuant. It's a massive game-changer! I managed to get it compiled and running on Windows with a mobile RTX 4070 (8GB), getting an incredible **17 t/s on Nemotron 30B (MoE)**. 

However, the current `feature/turboquant-kv-cache` branch doesn't compile out of the box on Windows using the Microsoft C++ compiler (MSVC). I had to apply a few manual patches to get a successful build. 

I'm sharing these fixes here in hopes they can be integrated to make Windows builds seamless:

**1. Missing `M_PI` math constant**
In `ggml-turbo-quant.c`, MSVC requires defining math constants before includes:
```c
#define _USE_MATH_DEFINES
#include <math.h>
2. Missing variable in ops.cpp
The build failed due to missing turbo3_cpu_wht_group_size. I had to add this line:

C++
int turbo3_cpu_wht_group_size = 1;
3. MSVC Linker/Scope errors with g_innerq_scale_inv_host
MSVC throws "undeclared identifier" and linker errors because of how extern is handled across the files. I had to clean up the existing extern declarations of g_innerq_scale_inv_host in the headers/cuda files and explicitly declare it at the very top of llama-kv-cache.cpp:

C++
/* MSVC LINKER BYPASS */
float * g_innerq_scale_inv_host = nullptr;
bool turbo_innerq_needs_tensor_update(void) { return false; }
void turbo_innerq_mark_tensor_updated(void) {}
4. Flash Attention incompatibility
Currently, compiling with -DGGML_FLASH_ATTN=ON crashes the MSVC compiler on Windows, so I highly recommend Windows users to build with -DGGML_FLASH_ATTN=OFF for now.

Python Bindings Note:
For anyone using llama-cpp-python, I discovered that the magic IDs to trigger the experimental cache are type_k=41 and type_v=41 for TURBO3_0.

I wrote a fully automated PowerShell build script with these patches for the Windows community and posted the guide on Reddit here:
[https://www.reddit.com/r/LocalLLaMA/comments/1s931oz/llamacpp_new_turboquant_3bit_kv_cache_is_insane/](https://www.reddit.com/r/LocalLLaMA/comments/1s931oz/llamacpp_new_turboquant_3bit_kv_cache_is_insane/)

Thanks again for this brilliant memory optimization!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Porting TurboQuant to Windows (MSVC): Compatibility fixes and Python IDs #39

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Porting TurboQuant to Windows (MSVC): Compatibility fixes and Python IDs #39

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions