Skip to content

TheTokenCompany/the-token-company-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Token Company Python SDK

Compress LLM prompts to reduce costs and latency. 100K tokens compressed in ~85ms.

CI PyPI version Python versions License: MIT

Docs · Website · Dashboard · Node.js SDK

Install

pip install the-token-company

Quick start

from thetokencompany import TheTokenCompany

client = TheTokenCompany(api_key="ttc-...")
result = client.compress("Your long prompt text here...", model="bear-2")

print(result.output)           # compressed text
print(result.tokens_saved)     # tokens removed
print(result.compression_ratio)  # e.g. 1.8

SDK wrappers

Drop-in wrappers that auto-compress all non-assistant messages before sending to your LLM. Assistant messages pass through unchanged so the provider's KV cache stays warm.

OpenAI / OpenRouter

from openai import OpenAI
from thetokencompany.openai import with_compression

client = with_compression(OpenAI(), compression_api_key="ttc-...")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant..."},
        {"role": "user", "content": "Summarize these results..."},
    ],
)

Works with AsyncOpenAI too — the wrapper detects async automatically.

Anthropic

from anthropic import Anthropic
from thetokencompany.anthropic import with_compression

client = with_compression(Anthropic(), compression_api_key="ttc-...")

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a helpful assistant...",
    messages=[{"role": "user", "content": "Summarize these results..."}],
)

Both messages and the system parameter are compressed.

Async

from thetokencompany import AsyncTheTokenCompany

async with AsyncTheTokenCompany(api_key="ttc-...") as client:
    result = await client.compress("Your long prompt text...")

Models

Model Description
bear-2 Latest, recommended
bear-1.2 Previous generation

Aggressiveness

Control compression intensity with aggressiveness (0.0 – 1.0, default 0.5):

result = client.compress(text, model="bear-2", aggressiveness=0.8)

App ID

Tag compression requests with an application identifier for usage tracking:

# Set on the client — applies to all requests
client = TheTokenCompany(api_key="ttc-...", app_id="my-chatbot")

# Or per-request (overrides the client-level value)
result = client.compress(text, model="bear-2", app_id="my-chatbot")

Also supported in wrappers:

client = with_compression(OpenAI(), compression_api_key="ttc-...", app_id="my-chatbot")

Gzip

Enable gzip compression of request payloads for better performance on large inputs (up to 2.2x faster on 1M+ tokens):

client = TheTokenCompany(api_key="ttc-...", gzip=True)

Protect text from compression

Use protect() to wrap content in <ttc_safe> tags — protected text passes through unchanged:

from thetokencompany import protect

prompt = f"{protect('system:')} You are a helpful assistant.\n{protect('user:')} Hello!"
result = client.compress(prompt, model="bear-2")

Response

CompressResponse fields:

Field Type Description
output str Compressed text
output_tokens int Token count after compression
input_tokens int Token count before compression
tokens_saved int Tokens removed
compression_ratio float Ratio (e.g. 1.8x)

License

MIT

About

Python SDK for The Token Company. Compress LLM prompts to reduce costs and latency

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages