Inspiration
This project was born out of the desire to combine speech to text AI with local LLMs to create a product that improves the developer experience by a large factor.
What it does
SpeechScaffold is a tool which developers run alongside the IDE. The tool allows a developer to rapidly build and iterate on the style and functionality of their React components simply by describing the changes they want out loud. It integrates directly with the IDE, so the developer can instantly edit and use code they generate.
How I built it
I used React for the frontend, since this is a very popular web framework and also most likely to draw good results from Codellama and/or Mistral. The AI agent, written in Python, is a heavily modified version of an AI assistant link which, with the modifications, is able to fetch the current state of the code, iterate it based on the developer's voice commands, and output a new and improved version of the code each time.
Challenges we ran into
I originally used Svelte for the framework, since it is more modern than React and has much simpler code in many cases. However, the LLMs do not output Svelte at the same quality, likely due to it being a newer framework. Thus, I had to retool the entire pipeline to React.
Accomplishments that we're proud of
I am proud of the fact that the tool is able to achieve fairly consistent and accurate results, which means it is several factors faster than writing the code for the same components by hand.
What we learned
I learned a lot about implementing OpenAI's whisper model, inferencing with Ollama and streaming its output, parsing and cleaning LLM output data to fit rigorous standards for code, and prompt engineering for code iteration.
What's next for SpeechScaffold
The next improvements include improving the list of special commands by adding memory with undo, multiple generations with choice of best, and improving the speed of generation via external APIs rather than self hosting the LLM.
Log in or sign up for Devpost to join the conversation.