Inspiration

The inspiration came from recognizing a common problem in the LLM era: token limits and costs. Many developers and researchers work with verbose prompts that eat up valuable tokens without adding semantic value. We wanted to explore how compression affects LLM performance while building a practical tool that visualizes token importance and helps optimize prompt efficiency. The goal was to bridge the gap between academic research on prompt compression and real-world applications that developers could use in their daily workflows.

What it does

Uses advanced NLP models to identify and remove redundant tokens while maintaining context

Visual Token Analysis: Color-coded visualization showing which tokens were kept (green) or removed (red) during compression

LLM Integration: Supports multiple LLM providers (OpenAI, Anthropic, Ollama, custom APIs) to test compressed prompts

Conversation History: Maintains context across multiple interactions with compressed messages

Configurable Parameters: Adjustable compression rates, temperature, token limits, and other LLM settings

Multi-page Interface: Clean separation between compression, API configuration, and project information

How we built it

Front-end: Streamlit for the interactive web interface with custom HTML/CSS for enhanced styling

Core Compression: Microsoft's LLMLingua-2 with BERT-base multilingual model for intelligent token selection

LLM Integration: LiteLLM library for unified API access across multiple providers

Configuration Management: JSON-based profile system using platformdirs for cross-platform config storage

UI/UX: Multi-column layouts, real-time streaming responses, and visual token highlighting

Challenges we ran into

Multi-Provider API Integration: Handling different authentication methods and parameter formats across various LLM providers

Configuration Persistence: Building a robust profile management system that works across different operating systems

Streaming UI Updates: Implementing real-time token streaming while maintaining Streamlit's reactive paradigm required creative workarounds

Accomplishments that we're proud of

Clean User Experience: Delivered a polished interface that makes complex NLP concepts accessible to non-experts

Intuitive Visualization: Created clear, actionable visual feedback showing exactly which tokens were compressed

Flexible Architecture: Built a modular system that supports any LLM provider through configurable profiles

Seamless Integration: Successfully combined academic research (LLMLingua) with practical developer tools

What we learned

Prompt Engineering Insights: Gained deep understanding of how compression affects different types of content (code, natural language, structured data)

Streamlit Mastery: Advanced techniques for state management, custom styling, and real-time updates in Streamlit applications

User-Centered Design: Importance of visual feedback and intuitive interfaces when dealing with complex technical concepts

What's next for Prompt Compressor

Performance Metrics: Track compression ratios, response quality, and cost savings over time

API Service: Transform into a REST API for integration with existing development workflows

Browser Extension: One-click prompt compression for web-based LLM interfaces

Integration Ecosystem: Plugins for popular development tools, IDEs, and AI platforms

Built With

  • linguallm
  • llmlite
  • python
  • streamlit
Share this project:

Updates