Threadstone

Threadstone is offline terminal chat for local MLX language models on Apple Silicon. It starts a local mlx-vlm server for the model you choose, streams the answer into your terminal, keeps the conversation usable across crashes, and shuts the server down when you leave.

The useful part is ownership. The network is only for setup and model downloads. After that, chat runs against local snapshots with Hugging Face offline flags, no telemetry, no hosted API, and no account.

Quick Start

Requirements: macOS, Apple Silicon, Python 3.13+, Homebrew, and about 12 GB for the default model set.

git clone https://github.com/gabrimatic/threadstone.git
cd threadstone
./setup.sh

setup.sh creates ~/mlx-env, installs the MLX runtime, downloads the configured Qwen model snapshots, installs the threadstone CLI, and writes shell helpers for oracle, forge, and quench.

Start a chat:

oracle
oracle 4B
oracle "be terse" 2B

Use the installed CLI directly:

threadstone --list-models
threadstone --doctor
threadstone "answer like a systems engineer" 9B

Local Runtime

Threadstone uses the network during setup. Runtime chat stays on localhost and local model files.

Path	Runtime scope
Model snapshots	Hugging Face cache on disk
Inference server	`mlx_vlm.server` on `127.0.0.1`
Chat client	Python stdlib HTTP and SSE client
Session history	In memory for the current terminal session
Session restore	`~/.cache/threadstone/` per model and terminal tab
Telemetry	Disabled through environment flags

Offline guard:

HF_HUB_OFFLINE=1
TRANSFORMERS_OFFLINE=1
HF_DATASETS_OFFLINE=1
HF_HUB_DISABLE_TELEMETRY=1
HF_HUB_DISABLE_IMPLICIT_TOKEN=1
HF_TOKEN=
DO_NOT_TRACK=1
DISABLE_TELEMETRY=1
ANONYMIZED_TELEMETRY=0

Commands

Inside chat:

Command	Effect
`/read <path>`	Attach a file or directory listing to the next message
`/drop`	Cancel the pending attachment
`/history`	Show recent visible conversation turns
`/restore`	Restore the saved session for this terminal tab
`/clear`	Reset conversation state and pending attachment
`/help`	Show in-session commands
`exit` / `quit`	Stop the chat and terminate the owned server

Shell helpers from setup.sh:

Command	Effect
`oracle [prompt] [size]`	Start chat with an owned server
`forge [size]`	Start a manual background server
`quench [size	all]`
`threadstone --doctor`	Check platform, venv, model snapshots, offline flags, and port state
`threadstone --list-models`	Print configured models, ports, context limits, and paths

Models

Default models are 4-bit MLX snapshots from mlx-community.

Key	Model	Thinking	Max tokens	Context trim
`9B`	Qwen3.5-9B-MLX-4bit	Yes	4096	8000
`4B`	Qwen3.5-4B-MLX-4bit	Yes	4096	8000
`2B`	Qwen3.5-2B-MLX-4bit	No	2048	4000
`0.8B`	Qwen3.5-0.8B-MLX-4bit	No	1024	2000

Each model has its own default port, memory estimate, context threshold, and response budget in config.py.

Behavior

Threadstone owns the server lifecycle for normal chat.

Port recovery: if the default port is busy, Threadstone scans forward and starts the model on the next available localhost port.
RAM guard: startup checks free and reclaimable memory before launching a model, including other reachable model servers.
Crash recovery: if the server disappears during a turn, Threadstone restarts it and resends the pending message.
Thinking models: reasoning streams dimmed until </think>, then the final answer prints normally and only the final answer is sent back in later history.
Attachments: /read accepts text files and directory listings, rejects non-regular files, rejects binary-looking content, and caps file payloads at 50 KB.
Context trimming: old turns are trimmed when the approximate context crosses the configured threshold while preserving valid role alternation.

Architecture

oracle / threadstone
    |
    v
threadstone.py
    |-- argument parsing, doctor, REPL, streaming, history
    |-- ServerManager starts and monitors mlx_vlm.server
    |-- /read attaches bounded local text context
    |
    v
config.py
    |-- model registry, ports, limits, memory estimates
    |-- snapshot resolution from the Hugging Face cache
    |-- offline environment guard

Files:

File	Role
`threadstone.py`	CLI, server lifecycle, chat loop, streaming parser, attachment handling
`config.py`	Model registry, offline env, runtime limits, validation
`setup.sh`	One-time macOS installer and shell helper setup
`tests/`	Unit coverage for parsing, streaming, history, config, attachments, and snapshots

Development

python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
python3 -m unittest discover -s tests -t . -v
python3 -m build --sdist --wheel
python3 -m twine check dist/*

Run the CLI without starting a model:

threadstone --version
threadstone --list-models

Run the local setup check after ./setup.sh:

threadstone --doctor --all-models

Package Release

Threadstone is a Python package, so the package registry is PyPI, not pub.dev. pub.dev is for Dart and Flutter packages.

Release path:

Update CHANGELOG.md and pyproject.toml.
Run tests, build, and twine check.
Create a GitHub release tag such as v1.1.0.
The release workflow builds the sdist and wheel, then publishes to PyPI through Trusted Publishing.

PyPI Trusted Publishing must be configured for:

Field	Value
Owner	`gabrimatic`
Repository	`threadstone`
Workflow	`release.yml`
Environment	`pypi`

Security

Runtime chat is local by design, but model files and dependencies still come from external package and model hosts during setup.

Report vulnerabilities through GitHub private vulnerability reporting. Do not open a public issue for security reports.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
config.py		config.py
pyproject.toml		pyproject.toml
setup.sh		setup.sh
threadstone.py		threadstone.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Threadstone

Quick Start

Local Runtime

Commands

Models

Behavior

Architecture

Development

Package Release

Security

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Threadstone

Quick Start

Local Runtime

Commands

Models

Behavior

Architecture

Development

Package Release

Security

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages