cat-language-translator

Home page
The side button is the sound card.
Custom settings
Settings
Through video upload multimodal large models for analysis
Translate into natural language
Knowledge card
Some additional analytical information
Audio Collection

🐱 Inspiration

The inspiration for this project stems from my personal experience. Last year, I adopted a stray cat, but no matter how hard I tried, it always kept its distance from me, and sometimes even let out threatening meows. I thought it was just the cat's natural aloofness, until a friend who understands animal behavior came to my house.
After observing for a while, my friend told me, "Your cat has actually been 'speaking' to you all along, but you haven't understood its language." It turned out that every sound and every action of the cat had a specific meaning. When I learned to understand its "language", our relationship underwent a tremendous change.
This experience made me realize that the communication barrier between humans and animals is a widespread problem. If AI technology could be used to "translate" animal language, it would help countless families establish better relationships between humans and pets.

🛠️ What it does

Cat Language Translator is a revolutionary multimodal AI application that can "translate" cats' language and behavior in real time. It possesses the following core functions:

🎤 Real-time speech recognition: Capture and analyze various cat noises (meowing, purring, hissing, etc.)
📷 Visual behavior analysis: Monitor cats' body language and behavior patterns through cameras
🧠 Intelligent knowledge base: Provides rich explanations of cat behaviors and interactive suggestions
🔗 Multimodal fusion: Combining audio and visual information to provide more accurate behavior analysis
⚡ Real-time feedback: Display analysis results and recommended response methods in real-time

🏗️ How we built it

We have built this application using a modern technology stack:

Front-end architecture:

Vue.js 3 + TypeScript builds responsive user interface
Tailwind CSS for Beautiful Visual Design
Web Audio API handles real-time audio streams
Canvas API for video screenshot and image processing

AI integration:

Integrated iFlytek Qwen2.5-VL-32B-Instruction Visual Large Model
Using iFlytek Speech Recognition API for Audio Analysis
Self-developed cat voice classification algorithm
Multi-modal data fusion processing

System architecture:

Modular service design (audioService, catSoundClassifier, etc.)
Real-time data stream processing
Intelligent Permission Management System
Responsive interface adaptation

⚠️ Challenges we ran into

🔧 1. Technical challenges:

Delay optimization of real-time audio processing
Synchronization and fusion of multimodal data
User experience optimization of browser permission management
Cross-platform compatibility issues

🧪 2. Algorithm Challenge:

Feature extraction and accurate classification of cat sounds
Stability of Visual Behavior Recognition under Different Light Conditions
Behavior pattern recognition in various contexts
Real-time inference performance optimization of AI models

🧑‍💻 3. User Experience Challenge:

Present complex AI analysis results in an intuitive way
Design a user-friendly permission request process
Ensure smooth operation of applications on various devices

🏆 Accomplishments that we're proud of

✅ Technological breakthrough: Successfully achieved real-time fusion analysis of multimodal AI, meeting performance standards for commercial grade applications
🎨 User Experience: Created an intuitive and user-friendly interface, making complex AI technology simple and easy to use
💡 Innovation: For the first time, combining speech recognition, visual analysis, and animal behavior has opened up new application areas
🐾 Practical value: It solves the real communication problem between humans and pets and has huge market potential
🔧 Technical depth: Integrated with multiple cutting-edge AI technologies, demonstrating strong technical integration capabilities

📚 What we learned

During the development process, we gained valuable experience:

On a technical level:

Deeply mastered the design and implementation of multimodal AI systems
Learned real-time data stream processing and performance optimization
Understood the advanced applications and limitations of browser APIs
Mastered the best practices of modern front-end frameworks

Domain knowledge:

In-depth study of animal behavior and cat communication patterns
Understood the practical applications of speech recognition and computer vision
Learned the importance of user experience design

Project Management:

Learned how to balance technical complexity and user friendliness
Understood the importance of iterative development and continuous optimization

🚀 What's next for Cat Language Translator

We are looking forward to the future of the project and plan to achieve the following goals:

📅 Short-term goal (3–6 months):

Expand to other pets (dogs, birds, rabbits, etc.)
Enhance the accuracy and response speed of AI models
Add personalized user settings and learning features
Developing mobile applications

📈 Mid-term goal (6–12 months):

Implement timbre cloning technology, allowing users to have a conversation with their pets using their own voice
Building a pet behavior database and community platform
Integrate IoT devices to achieve smart home linkage
Develop pet health monitoring function

🌌 Long-term vision (1–3 years):

Expand into the field of wildlife conservation
Develop AR/VR interactive experience
Create an animal language learning game platform
Establish a global network for animal behavior research

🌍 Our ultimate goal is to create an AI ecosystem that truly understands animal language, making communication between humans and animals natural and profound, and contributing to building a more harmonious relationship between humans and nature.

Built With

api
css
deeplearning
javascript
qwen
tailwind
typescript
vite
vue
websocket

Submitted to

Created by

I am responsible for the architecture design and full-stack development, and we are advancing the project. The estimated capability is to allow users to replicate their own voices, transforming their speech into animal sounds with their unique tone; even enabling different species of animals to communicate: for example, cats and dogs often fight due to opposing intentions, and we want to convert the meows emitted by cats into barks.

mask mask
Hongyu Chen
Chen Yutong
Xiangrui Hou