Inspiration
The inspiration came from recognizing a common problem in the LLM era: token limits and costs. Many developers and researchers work with verbose prompts that eat up valuable tokens without adding semantic value. We wanted to explore how compression affects LLM performance while building a practical tool that visualizes token importance and helps optimize prompt efficiency. The goal was to bridge the gap between academic research on prompt compression and real-world applications that developers could use in their daily workflows.
What it does
Uses advanced NLP models to identify and remove redundant tokens while maintaining context
Visual Token Analysis: Color-coded visualization showing which tokens were kept (green) or removed (red) during compression
LLM Integration: Supports multiple LLM providers (OpenAI, Anthropic, Ollama, custom APIs) to test compressed prompts
Conversation History: Maintains context across multiple interactions with compressed messages
Configurable Parameters: Adjustable compression rates, temperature, token limits, and other LLM settings
Multi-page Interface: Clean separation between compression, API configuration, and project information
How we built it
Front-end: Streamlit for the interactive web interface with custom HTML/CSS for enhanced styling
Core Compression: Microsoft's LLMLingua-2 with BERT-base multilingual model for intelligent token selection
LLM Integration: LiteLLM library for unified API access across multiple providers
Configuration Management: JSON-based profile system using platformdirs for cross-platform config storage
UI/UX: Multi-column layouts, real-time streaming responses, and visual token highlighting
Challenges we ran into
Multi-Provider API Integration: Handling different authentication methods and parameter formats across various LLM providers
Configuration Persistence: Building a robust profile management system that works across different operating systems
Streaming UI Updates: Implementing real-time token streaming while maintaining Streamlit's reactive paradigm required creative workarounds
Accomplishments that we're proud of
Clean User Experience: Delivered a polished interface that makes complex NLP concepts accessible to non-experts
Intuitive Visualization: Created clear, actionable visual feedback showing exactly which tokens were compressed
Flexible Architecture: Built a modular system that supports any LLM provider through configurable profiles
Seamless Integration: Successfully combined academic research (LLMLingua) with practical developer tools
What we learned
Prompt Engineering Insights: Gained deep understanding of how compression affects different types of content (code, natural language, structured data)
Streamlit Mastery: Advanced techniques for state management, custom styling, and real-time updates in Streamlit applications
User-Centered Design: Importance of visual feedback and intuitive interfaces when dealing with complex technical concepts
What's next for Prompt Compressor
Performance Metrics: Track compression ratios, response quality, and cost savings over time
API Service: Transform into a REST API for integration with existing development workflows
Browser Extension: One-click prompt compression for web-based LLM interfaces
Integration Ecosystem: Plugins for popular development tools, IDEs, and AI platforms
Built With
- linguallm
- llmlite
- python
- streamlit

Log in or sign up for Devpost to join the conversation.