๐ Inspiration
In the age of content creators and founder-led brands, LinkedIn has become the new TikTok for professionals. Founders need to consistently publish valuable content to build their reach and personal brand. But managing social media while running a business is overwhelmingโyou're juggling Slack messages, drafting posts, and trying to stay on top of communications.
We asked ourselves: What if you could just talk to an AI assistant and have it handle everything for you? That's how Bob was bornโa voice-first digital delegate that can post to LinkedIn, manage your messages, generate images for posts, and even help you craft content from your team's Slack discussions.
๐ก What it does
Yell To Bob is a real-time voice assistant that acts as your personal social media manager and communication delegate:
- ๐ฃ๏ธ Voice-First Interaction: Talk naturally to Bob using real-time voice recognition powered by Deepgram Nova-3
- ๐ LinkedIn Publishing: Create and publish LinkedIn posts with AI-generated content and images
- ๐จ AI Image Generation: Generate professional images for your posts using Gemini's image capabilities
- ๐ฌ Slack Integration: Read team channels, summarize discussions, and even create LinkedIn posts based on your team's conversations
- ๐ Web Search: Get real-time information from the web to inform your content
- ๐ Calendar Integration: Check your schedule and create events through voice commands
- ๐ง Conversation Memory: Recall previous conversations using MongoDB vector search
- ๐ฆ X/Twitter Support: Post to X with the same voice-first experience
๐ ๏ธ How we built it
Architecture Overview
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ React + Vite โโโโโโถโ LiveKit โโโโโโถโ Python Backend โ
โ Frontend โโโโโโโ Real-time โโโโโโโ Multi-Agent โ
โ (Voice UI) โ โ Audio/Video โ โ System โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Gemini 2.0 โ โ ElevenLabs โ โ Stagehand โ
โ Flash LLM โ โ TTS โ โ Browser โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ Automation โ
โโโโโโโโโโโโโโโโโโโ
Key Technologies
- Frontend: React + Vite with LiveKit client for real-time voice streaming
- Real-time Communication: LiveKit for low-latency audio streaming
- Speech-to-Text: Deepgram Nova-3 for accurate real-time transcription
- LLM: Google Gemini 2.0 Flash for intent routing and response generation
- Text-to-Speech: ElevenLabs for natural-sounding voice responses
- Multi-Agent System: Custom Python framework with specialized agents (LinkedIn, Slack, X/Twitter)
- Browser Automation: Stagehand for automated social media posting
- State Management: Redis for cross-agent state sharing
- Conversation Memory: MongoDB with vector search for recalling past conversations
- Observability: Arize Phoenix for LLM tracing and monitoring
๐ง Challenges we ran into
Real-time Voice Latency: Achieving low-latency voice interactions required careful optimization of the audio pipeline between LiveKit, STT, LLM, and TTS components.
Multi-Agent Coordination: Building a system where multiple specialized agents (LinkedIn, Slack, X) could seamlessly hand off conversations while maintaining context was complex.
Browser Automation Reliability: Automating LinkedIn posting through browser automation (Stagehand) required handling various edge cases, session management, and anti-bot measures.
LangGraph Workflow Integration: Implementing multi-step workflows for LinkedIn drafting with user confirmation loops using LangGraph required careful state management.
Voice User Experience: Designing natural conversation flows where the AI knows when to ask for confirmation vs. when to proceed autonomously.
๐ Accomplishments that we're proud of
- End-to-End Voice Pipeline: Successfully built a complete voice-to-action pipeline that can take a voice command and execute a LinkedIn post with an AI-generated image
- Intelligent Agent Routing: The system intelligently routes conversations to specialized agents without users needing to specify which service they want to use
- LangGraph Workflows: Implemented sophisticated multi-step workflows for content creation with built-in user approval flows
- Cross-Agent Memory: Agents can recall and use information from previous conversations using vector search
๐ What we learned
- The importance of designing voice-first UX โ it's fundamentally different from chat interfaces
- How to build modular multi-agent systems that can scale to handle different platforms
- Real-time audio streaming is challenging but incredibly rewarding when it works
- LangGraph is powerful for building stateful, multi-step AI workflows
- The value of observability (Arize Phoenix) when debugging complex AI pipelines
๐ฎ What's next for Yell To Bob
- More Platforms: Add Instagram, TikTok, and YouTube support
- Advanced Slack Features: Reply to messages, join calls, and summarize meetings
- Mobile App: Native iOS/Android apps for on-the-go voice commands
- Email Integration: Manage and respond to emails through voice
- Meeting Notes: Automatically generate LinkedIn posts from meeting transcripts
- Analytics Dashboard: Track the performance of posts created through Bob
Log in or sign up for Devpost to join the conversation.