SmolLM2-1.7B server inference regression after 91814d4 (Phi-3 CPU fallback)

## Description

After commit `91814d4` ("Phi-3.5 server support + Metal workaround"), SmolLM2-1.7B server inference produces garbage output. This model previously worked correctly.

## Steps to Reproduce

```bash
./build-metal/quant-server SmolLM2-1.7B-Instruct-Q8_0.gguf -p 8080 -j 8

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is gravity?"}],"max_tokens":30,"temperature":0.0}'
```

## Actual Output

```json
{"content":"<|im_endturernocturno<|im_ennd>\nWhat is the answer to this question: What is"}
```

Also tested with `TQ_NO_METAL=1` — same garbage output.

## Expected Output

```
Gravity is the force that attracts two objects with mass towards each other...
```

(Worked correctly in earlier builds before `91814d4`)

## Root Cause Hypothesis

The `tq_matmul_force_cpu` thread-local variable and `_phi3_force_cpu` flag in `tq_forward()` may not be correctly scoped — if the flag leaks across requests or isn't reset, non-Phi-3 models could get incorrect matmul routing.

Also, the `tq_matmul_gguf_cpu` extern function may have buffer sizing assumptions that don't hold for Q8_0 matrices.

## Unit tests

35/35 pass — the regression is only visible in end-to-end server inference.

## Environment

- Commit: 91814d4
- Model: SmolLM2-1.7B-Instruct-Q8_0.gguf (MHA 32/32)
- Build: cmake -DTQ_BUILD_METAL=ON
- OS: macOS 15 (Apple M3, 16GB)

---
*Reported by ClawTeam*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SmolLM2-1.7B server inference regression after 91814d4 (Phi-3 CPU fallback) #77

Description

Steps to Reproduce

Actual Output

Expected Output

Root Cause Hypothesis

Unit tests

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

SmolLM2-1.7B server inference regression after 91814d4 (Phi-3 CPU fallback) #77

Description

Description

Steps to Reproduce

Actual Output

Expected Output

Root Cause Hypothesis

Unit tests

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions