Inspiration
Cursorless is a voice interface for manipulating text (e.g., code). We saw its potential as a bold new interface for text editing. However, it is very unintuitive and learning Cursorless amounts to learning a new language, with unnatural and complicated syntax. This was the inspiration behind Verbalist. We want to harness the power of voice (and AI) to greatly improve productivity while editing text, especially code. Most other AI products access user data. We also want to ensure data security of our product.
What it does
Verbalist is a VSCode extension that enables the use of voice to edit their code. After a user downloads and configures the extension, users can record small voice snippets describing the high-level actions they want to take on text. Then, our AI models decide the specific actions to execute in order to do the high-levels actions--all without processing the content of the file.
Challenges and what we learned
We learned some limitations of using large-language models on difficult, real-world tasks. For example, the LLM model we used often struggled to identify a correct, intuitive sequence of actions to perform the user's specified action. We spent a long time refining prompts; we learned that our final results were very sensitive to the quality of our prompts. We also spent a while setting up the interaction between our main extension TypeScript file and our Python file, which handled the recording and AI processing. Through this process, we learned how to set up inter-process communication and extensively using the standard libraries (e.g., input/output streams) of both Python and TypeScript.
Accomplishments we're proud of
Our extension allows users to use natural language to manipulate collections of lines and perform simple find-and-replace operations. T We also built on top of the VSCode text editing API to allow for higher-level operations without providing any file contents to AI.
What's next
The concepts behind this prototype can easily be extended to a fully-functional extension that adds a functionality not present in any other software today. We can implement more high-level, detailed actions for the AI to perform; for example, the ability to rename a variable, surround an expression in parentheses, or perform actions across multiple files. The voice interface can become a natural extension of the keyboard, one that allows programmers to spend less time thinking and more time doing.
Built With
- extension-api
- groq
- llama3
- numpy
- openai
- typescript
- vscode-api
- wave
- whisper
Log in or sign up for Devpost to join the conversation.