float is an experimental local-first learning agent designed to run on locally managed hardware with a focus on privacy. float specializes in long-term memory, background reflection, and user-controlled data collection, and is intended to be modular without requiring hosted infrastructure.
please know: float is still in the early stages of development. feedback, testing, and suggestions would be appreciated.
float leverages language models and a modular architecture to provide a platform for learning and interaction. I started working on this app to have a space to learn about AI and create a central, user-controlled platform for researching inference techniques and building domain-specific reinforcement learning sets. The long-term goal for float is a passive background knowledge agent that can learn, iterate, and model topics from serious research to daily life while keeping local data and user oversight first.
- Multi-mode chat (API, local, server) with model selection and conversation history stored under
data/conversations/. - Privacy levels for memories to prevent automatic RAG uploading secrets to cloud.
- Built-in tools with approvals and scheduling; tool calls and thought/tool streams show up in the Agent Console.
- Browser-first computer use with shared session-backed tools, screenshot results in chat, native OpenAI computer-tool passthrough for API mode, and an experimental Windows desktop runtime.
- Memory + RAG with Knowledge UI, plus
threadssemantic tagging. Text-based durable memories and flexible embeddings work in tandem. - Attachments + media viewer for images, PDFs, and common audio formats.
- Calendar events + scheduled actions/tasks.
- Conversation export/import (markdown/json/text) and history management, including OpenAI-style export ZIP ingest from the History sidebar via file upload (MD/JSON/text/ZIP), currently by selecting a zip and saving a new conversation, but this flow is not yet manually tested.
- Reflections: bounded background thought tasks can review recent chats, memories, or a user-supplied question, then store useful notes or follow-ups without becoming an always-on autonomous process.
- Write history: action/write tracking with tunable retention helps inspect and recover from tool-side changes.
- workflows chain together models to create a smooth and customizable experience; bounded recursion allows for more complex behavior.
- streaming live, voice, and video based interaction with plans to connect to a Float server (pc -> cloud gpu, or pc -> phone) securely.
- file management float is intended to work with a desktop environment; control over files in the
data/directory is a long-term goal. - persistence float is intended to spend more time observing and thinking than responding. The current reflection system is bounded and inspectable; unattended background review, live observation, and long-form rolling conversations with context compacting are still being expanded.
- proactive float aims to grow into the ability to message the user directly for clarification while reasoning and to suggest tasks and events (for example, a "project review").
- sensitivity detection privacy-aware routing and masking so sensitive requests can stay local or be redacted before leaving the device.
float's composer understands inline command tokens so you can trigger tools or search helpers without leaving the textarea. Typing %{toolname} (e.g. %remember ...) immediately flags the name as a tool call, ./ starts file search, // starts embedded-memory search, and .// blends both result sets. Suggestions appear alphabetically (like a terminal's Tab completion), render with a hyperlink-like treatment, and backspace at the end of a linked argument removes the link without losing the text.
| Trigger | Behavior |
|---|---|
%{toolname} (for example %remember ...) |
Calls a registered tool with the rest of the line as the payload. Suggestions are alphabetical and Tab accepts the highlighted entry before you finish typing the arguments. |
./ (file search) and // (memory search) |
Begin inline lookups scoped to repository/file contents or saved memories. Results look like links you can click to insert structured context; Tab cycles through matches in lexical order. |
.// |
Performs a blended search across both files and memories so a single lookup can pull from either surface without switching commands. |
These inline tokens remain clickable until you delete the trailing space or the entire link, and hitting Backspace when your cursor sits just after the linked text automatically unlinks it so you can edit the raw words.
- Language Models: Local Transformers checkpoints (GPT-OSS and Gemma 4 lanes first), managed local providers (LM Studio/Ollama/custom OpenAI-compatible servers), and cloud API endpoints. Defaults focus on
gpt-5.4(API) andgpt-oss-20b(local). Hugging Face is used for gated local downloads, and API endpoints can be changed to other providers. - Data Store: SQLite is the canonical store for durable memory, knowledge chunks, and the lightweight graph/claim substrate; Chroma is the local retrieval mirror, and Weaviate remains an optional vector backend. Using tool calls or manual user input, float can update, edit, store, and reason about memories. Ideally, long-form content is preserved without forcing everything into a naive vector-only search path. File-system interaction with markdown in a float workspace is being tested.
- Tool Calling: Built-in tools cover discovery, memory, retrieval, web, managed file access, compaction, reflections, action history, and guarded computer/capture surfaces. MCP remains available for external tool servers but is not required for built-ins.
- Modular Design: Allows for easy replacement of internal models and features.
- Privacy: Locally managed data with encrypted memories and selectively masked API calls allows you to use the same knowledge base across models.
- Workspaces: each float deployment is based around an internal workspace folder that contains these databases and files, which can be synced selectively, including as nested source-owned workspaces, between devices or instances.
- Python 3.11+
- Node.js 16+ with npm (in WSL use NVM)
- Redis (optional, for Celery/background workers)
- Docker (optional backend-image experiment; not the recommended alpha path)
We recommend using Poetry to manage Python dependencies, extras, and project packaging.
# Install Poetry (if not already installed)
pip install pipx
pipx install poetry
pipx ensurepath
poetry install
npm installpoetry env activate
# this prints the command to activate the associated env: copy and paste it e.g. & "C:\users\...pypoetry\cache\virtualenvs\ ..."
pip install torch==2.7.1+cu128 torchvision==0.22.1+cu128 torchaudio==2.7.1+cu128 --index-url https://download.pytorch.org/whl/cu128If pip/poetry stalls on large wheels, uv can install into the Poetry env:
poetry run uv pip install --upgrade --force-reinstall --index-url https://download.pytorch.org/whl/cu128 torch==2.7.1+cu128 torchvision==0.22.1+cu128 torchaudio==2.7.1+cu128CPU-only fallback (fixes torchvision::nms mismatch errors):
poetry run uv pip install --upgrade --force-reinstall --index-url https://download.pytorch.org/whl/cpu torch==2.7.1+cpu torchvision==0.22.1+cpu torchaudio==2.7.1+cpuThen launch the servers with the provided CLI:
poetry run floatOptional computer-use bootstrap:
powershell -ExecutionPolicy Bypass -File scripts/bootstrap_computer_use.ps1
poetry run python scripts/computer_use_smoke.py --target allThis installs the Playwright browser runtime, installs Chromium, and runs direct browser plus Windows smoke checks. On non-Windows hosts, the browser runtime is the primary target and the Windows runtime is intentionally not expected to run.
once launched, navigate to settings, and ensure the url points to https://api.openai.com/v1/responses then add your openai API key from platform.openai.com
- Float keeps a small default model list, and will also poll the configured provider for available models (via
/api/openai/models) so current and future provider entries show up in the selectors. The API default is currentlygpt-5.4.
Some local models (e.g. Gemma) are gated on Hugging Face. Create a personal access token at https://huggingface.co/settings/tokens with read access, accept the model license on the repo page, and set it in Settings (HF Token) or via HF_TOKEN / HUGGINGFACE_HUB_TOKEN in the environment.
Float can run local inference directly on the machine or through a managed provider. Direct local Transformers is intended for checkpoints Float can load itself; managed providers and Server/LAN are for OpenAI-compatible runtimes that are already running elsewhere, including LM Studio, Ollama, or another local/server endpoint.
Managed local providers:
- LM Studio docs: LM Studio CLI
- Ollama docs: Ollama Docs
The provider path uses an OpenAI-compatible transport such as http://<host>:<port>/v1.
Server/LAN is separate: it points at an already-running OpenAI-compatible server via server_url and does not use the local provider manager.
If local Transformers fails on BF16/MXFP4/CUDA mismatches, switch to a managed quantized runtime such as lmstudio or ollama.
If the model you have is a raw .gguf, do not treat it like a direct local Transformers checkpoint. Run it behind LM Studio, Ollama, or another OpenAI-compatible server first.
Gemma 4 now follows an explicit three-lane split in Float:
Cloud API: keep OpenAI Realtime for live voice and live-session transport.Server/LAN: use LM Studio or another OpenAI-compatible endpoint for larger Gemma 4 deployments such asgemma-4-E4B-it,gemma-4-26B-A4B-it, andgemma-4-31B-it.Local (on-device): direct local Transformers support now targetsgemma-4-E2B-itas the first real Gemma 4 checkpoint.
Direct local Gemma 4 uses Hugging Face's multimodal AutoProcessor + AutoModelForImageTextToText path, so gemma-4-E2B-it can handle text-only turns and still-image plus text turns locally. The larger Gemma 4 checkpoints remain provider/server-first in this pass and are intentionally not exposed as built-in direct-download recommendations. Gemma also informs local live/multimodal experiments, but live voice remains API-first in this patch.
Routing snapshot:
- Chat uses
api,local, orservermode. - Text embeddings use
rag_embedding_model(local:*,api:*, orsimple) and do not automatically followserver_url. - TTS uses OpenAI
tts-1/tts-1-hdor localkitten/kokorostyle models. - Live voice uses OpenAI Realtime by default. Gemma 4 is not a supported live-voice transport in this pass; use Gemma 4 through the local or server language-model lanes instead. LiveKit remains a fallback/experimental transport, and Pipecat is still an explored pipeline option rather than the default.
- The public capability-by-mode overview lives in
docs/feature_overviews/models-and-runtime-modes.md; endpoint details live indocs/api_reference.md.
Float should treat sync and live streaming as device-trust problems, not public-account problems.
The secure individual-focused model is:
- Explicit device pairing, not shared login credentials.
- Private transport only, such as LAN, VPN, or a user-operated tunnel.
- Short-lived session grants for sync and streaming.
- Per-feature scopes with revocation, so a device can be trusted for sync but not voice, or voice but not file access.
- No public exposure by default.
OAuth-style login can still make sense for hosted collaboration later, but for a personal float deployment the first-class path should be trusted devices and scoped sessions. Tailscale is recommended as an external layer if this type of remote access is desired.
Workspaces now sit under that same model.
- Every device has one root workspace.
- Additional named workspaces can represent separate local roots such as
workandpersonal. - Sync can either:
mergeselected workspaces into a target workspace, orimport nestedso one device's workspace appears as a source-owned nested workspace on another device.
- Imported nested workspaces keep source metadata so syncing back to the origin can ignore that imported copy and avoid recursive trees.
Float now includes a real trusted-device sync surface in Knowledge > Sync.
It is still early, but it is no longer just a hidden settings concept.
Current flow:
- Turn on
Visible on LANon the receiving device. - Copy that device's advertised LAN URL.
- Generate a one-time pairing code there.
- Enter the URL and code on the other device.
- Choose scopes, workspace mapping, and sync mode.
- Preview pull/push differences by section.
- Apply only the sections you want.
Sections currently covered:
- conversations
- memories
- knowledge
- knowledge graph
- attachments
- calendar
- workspace preferences
Current merge behavior is last-write-wins by each section's stored update timestamp. Conversation renames follow the stable conversation sidecar id when available, so folder/title moves survive a sync instead of being treated like unrelated chats.
The sync panel also exposes:
- current device visibility and pairing state,
- saved/paired/connected device states,
- inbound trusted-device state on the host,
- workspace-aware pull/push targets,
- source-linking and nested import behavior,
- import/export from the same surface.
For remote personal-GPU access, LM Studio's LM Link is the cleaner transport story to mention. It is adjacent to Float sync rather than a required Float setting.
If you want one Float instance to reach another machine without exposing it publicly, layer the app on top of a private tunnel or tailnet such as Tailscale Serve.
For more detail, see:
docs/feature_overviews/device-sync-and-streaming.mddocs/feature_overviews/conversations-history-and-storage.md
These sections are mostly for development and local maintenance. They may move into a dedicated developer README as the public-facing README gets shorter.
Run notebooks in the Poetry environment by installing a kernel once:
poetry run python -m ipykernel install --user --name float-project --display-name "float (poetry)"
All flags are per-run; nothing is persisted except sticky port/browser state in .dev_state.json.
--dev/-dev: enable dev mode for this run only (setsFLOAT_DEV_MODE=truefor the process; does not write.env).--server/--backend-only: start backend only (skip frontend).--ui/--frontend-only: start frontend only (skip backend).--skip-backend: do not start the backend server.--skip-frontend: do not start the frontend server.--backend-port <port>: set backend port (default: auto-select).--frontend-port <port>: set frontend port (default: auto-select).--sticky-ports/--no-sticky-ports: reuse last ports or choose new ports each run.--no-open: do not open a browser tab on launch.--open-once: open the browser only the first time (sticky across restarts).
To create or update a Desktop launcher named lowercase float:
powershell -ExecutionPolicy Bypass -File scripts/create_desktop_shortcut.ps1This creates float.lnk on your Desktop and uses the logo asset docs/resources/floatlogo.png (converted to frontend/public/float.ico) as the shortcut icon. The shortcut launches poetry run float from this repository root.
Set FLOAT_DEV_MODE=true before starting the backend (or run poetry run float --dev for a one-off session) to enable the Dev Panel route. Then navigate to /dev to:
- Run built-in test prompts (
/api/test-prompts). - Watch live thought/tool logs (
/api/ws/thoughts).
Install nvm https://learn.microsoft.com/en-us/windows/dev-environment/javascript/nodejs-on-wsl
Commands (in wsl shell) curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/master/install.sh | bash
nvm install --lts
Then install redis and postgres
sudo apt-get install redis
pip install pipx You will need to open a new terminal or re-login for the PATH changes to take effect. Alternatively, you can source your shell's config file with e.g. 'source ~/.bashrc'.
sudo apt-get install python3-venv Python3 -m pipx install poetry Python3 -m pipx ensurepath
Backend tests are located under backend/app/tests. Use Poetry to run them:
poetry run pytest -vv backend/app/tests
# Run API tests (marked `api`)
poetry run pytest -vv -m api backend/app/tests
# Run local tests (marked `local`)
poetry run pytest -vv -m local backend/app/testsRuntime artifacts live under data/ (gitignored so installs stay private):
data/databases/{calendar_events,chroma}— calendars now save JSON entries here and the Chroma vector store persists beside them.data/files/{uploads|screenshots|downloaded|workspace}— user uploads, captured media, tool downloads, and docs-ingest workspace files share one sandbox.data/models/— default cache/target for local model downloads (legacymodels/folders are still detected if present).data/workspace/— scratch space Float can edit freely during personal-device streaming or tool executions.
See docs/data_directory.md for the full layout and usage notes; conversation history lives under data/conversations/ (legacy conversations/ is auto-migrated on startup when FLOAT_CONV_DIR is unset) and blobs/ remains beside the repo for now.
Float is intended to be customized in layers rather than only through one giant prompt edit.
- Add-ons are packages of skills (markdown descriptive files), tools (callable Python scripts), and assets (any other files needed, such as images or reference files).
- You can edit Float's base system prompt in
backend/app/config.py. - Built-in tools are part of the backend itself; users are not generally expected to edit them directly, and add-ons are the cleaner extension point for adding new behavior.
- Workflows are lightweight behavior profiles that decide how Float should approach a task, including its default reasoning style and which modules are enabled for that run.
- Themes are customizable from Settings: you can create a new theme, choose values for the eight palette slots, save it, rename it, edit it later, or delete it. User-created themes live under
data/themes/.
For more detail, see docs/feature_overviews/personalization-and-modules.md.
Float now has two voice transport paths:
- OpenAI Realtime is the current cloud-default path. Set
OPENAI_API_KEYand keepFLOAT_STREAM_BACKEND=api(the current default)./api/voice/connectwill mint an ephemeral client secret and return the browser-facing Realtime connect URL; the frontend then establishes WebRTC directly to OpenAI. - LiveKit remains available as a fallback. Set
FLOAT_STREAM_BACKEND=livekitplus the LiveKit credentials below if you want the older room/token flow instead.
OpenAI Realtime optional settings:
OPENAI_API_KEY=your_openai_key
FLOAT_STREAM_BACKEND=api
OPENAI_REALTIME_MODEL=gpt-realtime
OPENAI_REALTIME_TURN_DETECTION=server_vad
OPENAI_REALTIME_TTL_SECONDS=600LiveKit fallback settings:
FLOAT_STREAM_BACKEND=livekit
LIVEKIT_URL=ws://localhost:7880
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_SECRET=your_livekit_secretIn Realtime API mode the browser streams directly to OpenAI, so /api/voice/stream is not used. In LiveKit mode, /api/voice/connect returns the room token and the older worker-backed streaming path remains available. Live browser verification is still recommended for microphone permissions, turn-taking, and transcript/event surfacing.
Float's runtime selectors are lane-based rather than a fixed model zoo. The table below describes the current surfaces users are most likely to touch.
| Surface | Current primary paths | Notes |
|---|---|---|
| Chat LLM | Cloud API default gpt-5.4; direct local default gpt-oss-20b; custom OpenAI-compatible API, Server/LAN, LM Studio, or Ollama endpoints. |
Chat can route through API, direct local Transformers, managed local providers, or Server/LAN. Runtime parity is still in progress. |
| Gemma 4 | Direct local gemma-4-E2B-it; larger Gemma 4 checkpoints through Server/LAN or managed providers. |
Still-image plus text is the current local target. Live/multimodal work is experimental. |
| Speech and live voice | OpenAI Realtime for live voice; OpenAI TTS/STT by default; local voice paths remain experimental. | Browser microphone and transcript behavior still need live smoke testing after changes. |
| Retrieval and memory | SQLite durable store, Chroma retrieval mirror, optional provider embeddings, and optional Weaviate backend. | Durable memories and vector retrieval are deliberately separate so private or long-form text does not have to be mirrored everywhere. |
| Attachments and media | Images, PDFs, and common audio files through chat attachments and the media viewer. | Captions, retrieval indexing, and media metadata are visible in the UI but still evolving. |
| Tool discovery | help, tool_help, and tool_info. |
Default help is curated. Ask for brief detail, a family, or an exact tool_info lookup when a model needs schema details. |
| Core tools | remember, recall, search_web, crawl, read_file, list_dir, write_file, tasks/calendar, and action history. |
Tool calls are surfaced through chat and the Agent Console, with approvals and recovery paths for risky or malformed calls. |
| Background reflection | reflect, list_reflections, and optional reflect_after_save from memory tools. |
Reflection is bounded background thinking over explicit context, not an unbounded autonomous daemon. |
| Conversation compaction | compact_conversation_plan, compact_conversation_preview, and compact_conversation_write. |
Compaction is staged so a model can plan or preview before writing a compacted conversation artifact. |
| Guarded/experimental tools | Computer-use, capture, mcp.call, shell/patch helpers, and local-model routing. |
These are intentionally guarded, hidden from the default tool menu when appropriate, or still being stabilized. |
Float structures LLM exchanges with the
Harmony envelope. Messages are built
using the openai-harmony utilities and contain typed content blocks.
from openai_harmony import Message, Role
msg = Message.from_role_and_content(Role.USER, "Hello Harmony!")
print(msg.to_dict())
# {"role": "user", "name": None,
# "content": [{"type": "text", "text": "Hello Harmony!"}]}Pass response_format="harmony" to LLMService.generate to receive
Harmony-formatted responses.
- System prompt:
backend/app/config.py(override withSYSTEM_PROMPTin.env). - Built-in tool registry:
backend/app/tools/__init__.py(BUILTIN_TOOLS) and UI schemas inbackend/app/tool_specs.py. - Public feature overviews:
docs/feature_overviews/README.md,docs/feature_overviews/tools-and-actions.md,docs/feature_overviews/models-and-runtime-modes.md, anddocs/feature_overviews/personalization-and-modules.md. - Workflow runbooks and provider/mode coverage:
docs/feature_overviews/voice-live-and-passthrough.mdanddocs/feature_overviews/models-and-runtime-modes.md. Model readiness defaults live inbackend/config/model_catalog.yaml. - Modules/add-ons overview:
docs/feature_overviews/personalization-and-modules.md. - SAE inspection/steering scaffolding is experimental and intended for user-supplied compatible weights.
Float-specific Codex skills live in a separate repository so the app code and the Codex-facing skill prompts can evolve independently:
https://github.com/CherryResearch/float-codex-skills
The Poetry launcher is the recommended alpha path. The backend Dockerfile remains available for image experiments, but Docker is not the supported deployment path for this patch.
-
Build and Run the Backend Image:
# Build using the backend Dockerfile docker build -f docker/backend.Dockerfile -t float-backend . docker run -p 8000:8000 float-backend
The included
Dockerfileuses Poetry to install dependencies frompyproject.toml(without dev dependencies) for reproducible builds. Ensurepoetry.lockis committed alongsidepyproject.tomlto lock versions in production.
- API Proxy Features:
- GET
/api/responses: Proxy to OpenAI Responses API for listing responses. - GET
/api/responses/{response_id}/completions: Proxy to OpenAI Responses API for retrieving a specific response's completions. - GET
/api/transformers/models: List available GPT-OSS transformer models. - POST
/api/transformers/generate: Generate text with a selected transformer model.
- GET
- Model Context Management: Manage and display the current model context.
- Tool Integration: Add and manage tools for enhanced functionality.
- Privacy-Focused: Designed to operate with a focus on user privacy.
- Thought Streaming:
/api/stream/thoughtsprovides live thoughts and tool logs for the agent console. Integration with external providers is in progress. - Agent Console Snapshot:
GET /api/agents/consolehydrates the right-rail cards when reconnecting or refreshing. - Approval Levels: UI setting to require confirmation for risky actions.
- Data Visualization: Demo charts in the frontend are built with D3.
- File Attachments: Upload images (jpg, png, gif, webp), PDFs, and common audio formats and preview them in the media viewer.
curl -X POST http://localhost:8000/tools/register -d '{"name":"read_file"}'
curl -X POST http://localhost:8000/tools/invoke \
-d '{"name":"read_file","args":{"path":"README.md"}}'Float exposes built-in tools through the backend registry. MCP integration is available for external tool servers, but built-in tools do not require an MCP server. When the language model wants to run a tool it emits a small JSON block, for example:
{"name": "search_web", "args": {"query": "weather", "topn": 3}}The backend matches the tool by exact name, executes it, and returns a structured result or error into the same chat flow. The default discovery menu is intentionally curated: use help for a compact menu, help with detail="brief" or a family name for descriptions, and tool_info for exact schemas. Compatibility aliases still exist for older calls, but models should prefer canonical names such as remember, recall, and compact_conversation.*.
The LLMService can work with external models as well. Set environment variables like OPENAI_MODEL or LOCAL_LLM_URL and choose the api, local, or server mode to switch between them.
The Settings selectors accept any API/local endpoint pair; defaults are predictable and user settings persist. Current main defaults:
gpt-5.4(OpenAI API default)gpt-oss-20b(local default)
Additional local/server options are allowed through custom endpoints, managed providers, or manually selected local checkpoints. Presets are convenience defaults, not a closed list. Hugging Face cache clutter and modality filtering are still being improved, so some downloads may need a manual HF fetch and then selection from data/models/.
GPT-OSS can handle roughly 7B-20B locally on a modern GPU. 120B-class models usually require a remote GPT-OSS server or multi-GPU setup.
External contributions are accepted only after the contributor agrees to the repository's assignment terms in CLA.md. Accepted contributions are assigned to the project operator under the Contributor Assignment Agreement.
This repository is licensed under the GNU Affero General Public License, version 3 only. See LICENSE.

