LIGHT SPEED UP
Patent PendingIf your product retrieves docs, tickets, transcripts, or memory before every model call, you are probably paying to send context the model never actually uses.
STYX replaces similarity-heavy retrieval with deterministic context extraction. No embeddings, no vector database, no GPU. Proven to beat GraphRAG on answer quality in blind A/B testing while collapsing prompt size by up to 98%.
Whole chunks, transcripts, and knowledge pages get attached to every request, even when most of that text never changes the answer.
Deterministic compression keeps the reasoning signal, drops the ballast, and gives the model only what it actually needs to perform.
Better unit economics, smaller context windows, and a product story technical teams and buyers can both understand in one pass.
What the proof shows: lower prompt spend without answer-quality collapse, validated on 60,900+ documents
500 blind A/B judgments across 5 LLM models. Position randomized. Zero evaluation errors.
Higher answer quality with dramatically less prompt weight.
| Method | Wins | Rate |
|---|---|---|
| STYX | 305 | 61.0% |
| GraphRAG | 179 | 35.8% |
| Tie | 16 | 3.2% |
Near-parity answers without paying to ship the whole document.
| Method | Wins | Rate |
|---|---|---|
| STYX (compressed) | 239 | 47.8% |
| Raw (full context) | 238 | 47.6% |
| Tie | 23 | 4.6% |
The advantage holds across the model stack, not just one favorite model.
The more structure hiding inside the source material, the stronger STYX gets.
Standardized public benchmarks, local 14B model, no cloud APIs. Early proof that the compression story also translates into formal benchmark settings.
99.8% precision on public benchmarks
Independent LLM judge assessment
Maximum compression, minimum loss
Benchmarks: GraphRAG-Bench (4,072 questions) & ai-forever RAGBench (600 questions). Patent pending.
If retrieved context is part of your AI cost stack, STYX changes the unit economics.
Retrieval-Augmented Generation became the default way to give language models external knowledge. The problem is that similarity-based retrieval sends too much text, too often, and companies end up paying for context that never meaningfully changes the answer.
If retrieval context is part of your COGS, STYX attacks that line item directly. Across 60,900+ documents from 30 major open-source projects, measured with OpenAI's production tokenizer, the reduction stayed in the 90-98% range across every domain tested.
The business translation is simple: your retrieval token spend multiplied by 0.98 is the waste STYX is designed to remove.
Public API pricing as of early 2026. Per 1M tokens. Verify at each provider's pricing page.
| Provider | Model | Input / 1M | Output / 1M | Source |
|---|---|---|---|---|
| OpenAI | GPT-5.2 | $1.75 | $7.00 | openai.com/pricing |
| OpenAI | GPT-5.2 Pro | $21.00 | $84.00 | openai.com/pricing |
| Anthropic | Claude Sonnet 4.5 | $3.00 | $15.00 | anthropic.com/pricing |
| Anthropic | Claude Opus 4.6 | $5.00 | $25.00 | anthropic.com/pricing |
| Gemini 2.5 Pro | $1.25 | $10.00 | ai.google.dev/pricing | |
| xAI | Grok 3 | $3.00 | $15.00 | x.ai/api |
| Meta | Llama 4 (via API) | $0.10–$0.90 | varies | Various API providers |
| Microsoft | Azure OpenAI | OpenAI parity | +15–40% infra | azure.microsoft.com |
Any workflow that retrieves context before sending it to a language model, from enterprise software to scientific discovery to last-mile delivery.
AI copilots retrieve code snippets and document fragments by similarity, including irrelevant context that wastes tokens and degrades suggestion quality.
Delivers only decision-relevant context, enabling faster and more accurate completions with fewer tokens per request.
Incident tools retrieve logs, tickets, and runbooks by similarity, overwhelming responders with noise during outages when speed matters most.
Delivers only context relevant to the current failure mode, reducing mean time to resolution.
Research assistants retrieve papers and datasets by keyword similarity. A query about particle decay returns thousands of tangentially related physics papers. CERN alone generates 1 PB/second of collision data — AI tools reviewing this literature drown in noise.
Extracts only the structurally relevant findings for the current research question, letting scientists focus on discovery instead of filtering.
Brain imaging studies generate massive structured datasets. AI assistants retrieving related research pull in irrelevant studies about similar brain regions but different conditions, wasting compute on context that doesn't inform the analysis.
Surfaces only the data points and prior research structurally bound to the current hypothesis, accelerating insights from complex neural datasets.
Climate models consume vast datasets — satellite imagery, ocean temperatures, atmospheric readings. AI tools retrieving context for predictions pull in geographically or temporally irrelevant data by similarity alone.
Delivers only the measurements and model outputs that are structurally relevant to the current prediction window, reducing compute waste in climate modeling pipelines.
RAG systems retrieve patient records and medical literature by keyword similarity, returning tangential information that increases cognitive load for clinicians.
Surfaces only the context needed for the current clinical decision, reducing noise in time-sensitive environments.
Compliance systems retrieve regulatory text and transaction records by keyword, often returning hundreds of irrelevant documents per query.
Filters noise before retrieval, delivering only the regulations or transactions that impact the current compliance decision.
Contract review retrieves clauses by similarity, but similar text does not mean relevant text. Attorneys waste hours reviewing noise.
Focuses extraction on what matters for the legal question at hand, reducing review time and improving risk analysis.
Delivery platforms like DoorDash, Uber, and FedEx use AI to optimize routes, ETAs, and dispatch. These systems retrieve traffic data, driver history, and demand patterns — most of which is noise for any single delivery decision.
Extracts only the variables that structurally affect the current route decision — reducing token cost per dispatch query and enabling faster, cheaper real-time optimization.
Mapping services process billions of data points — road conditions, POIs, satellite imagery, user reports. AI features that answer natural language queries about locations retrieve massive context by proximity, not relevance.
Delivers only the geospatial data structurally bound to the user's query, making AI-powered map features faster and cheaper to run at scale.
Retail AI retrieves product catalogs, inventory levels, supplier data, and demand forecasts by similarity. A query about restocking one SKU pulls in data about hundreds of unrelated products.
Extracts only the inventory, supplier, and demand data relevant to the specific restocking decision, cutting token costs across millions of daily queries.
Support chatbots retrieve multiple help articles by similarity, often none of which address the actual issue. Costs scale linearly with ticket volume.
Extracts only the steps needed to resolve the specific problem, reducing tokens per interaction and improving resolution rates.
AI tutoring systems retrieve lesson content and student history to personalize instruction. Similarity-based retrieval pulls in material the student has already mastered or topics outside the current learning objective.
Delivers only the content structurally aligned with the student's current knowledge gap, making personalized education more efficient and affordable to scale.
AI-driven NPCs and game assistants retrieve dialogue trees, player history, and world state by proximity. Open-world games with millions of concurrent players generate massive token overhead for every NPC interaction.
Extracts only the narrative and player context structurally relevant to the current interaction, enabling richer AI characters at lower cost per query.
AI editing assistants retrieve video clips, transcripts, and archives by keyword similarity. Journalists and editors waste hours filtering noise from massive content libraries to find the right 10-second clip.
Surfaces only the media assets structurally bound to the current editorial decision, accelerating production workflows across news, film, and streaming.