\documentclass[12pt]{article} \usepackage{amsmath} \usepackage{hyperref} \usepackage{enumitem}
\title{Project Story} \author{} \date{}
\begin{document}
\maketitle
\section*{About the Project} This project began from a frustration: \textbf{computer control of CAD software via computer vision is slow and cumbersome}. In our initial experiments, we tried combining reasoning models with vision models to interpret UI/UX commands inside tools like \textit{Vectorworks}. The idea was for an AI agent to ``see'' the screen, reason about it, and then act --- but in practice, this approach was \textbf{slower than manual design}.
Our inspiration came from the observation that many AI-for-design demos suffer from this bottleneck: they try to fuse multiple modalities (vision, reasoning, and sometimes speech) into a single model architecture. This often introduces latency, complexity, and fragility into the workflow --- especially for professional CAD environments, where every second counts.
\section*{Key Insight} Rather than forcing all modalities into a single architecture, \textbf{we can simplify}. Instead of having an AI model interpret the screen visually and guess what to click, we can \textbf{directly call the CAD software's functions and tools via its API}.
For example: \begin{itemize}[leftmargin=2em] \item Instead of using computer vision to locate the ``Rectangle Tool'' in the toolbar, just call the \texttt{DrawRectangle()} function from the API. \item Rather than relying on image recognition to track selected objects, query the model database directly. \end{itemize}
This inspired the concept of \textbf{building a Model Context Protocol (MCP) server} for Vectorworks (or similar software), enabling \textbf{tool calling via OpenAI} or other LLMs without any computer vision layer.
\section*{How We Built It} \begin{enumerate}[leftmargin=2em] \item \textbf{Initial Prototype} -- Tested an AI+CV approach to manipulate CAD interfaces visually. \item \textbf{Bottleneck Analysis} -- Measured latency and accuracy; found CV interpretation + reasoning added significant delays. \item \textbf{Architecture Redesign} -- Dropped the combined vision-reasoning pipeline and moved toward a direct tool-calling model. \item \textbf{MCP Server Plan} -- Outlined how a Vectorworks MCP server could map high-level user intents (natural language) into precise CAD API calls. \item \textbf{Refactoring BIMgent} -- Began rewriting our BIM Agent to take advantage of direct API calls instead of UI simulation. \end{enumerate}
\section*{Challenges Faced} \begin{itemize}[leftmargin=2em] \item \textbf{UI Complexity} -- CAD software UIs are dense, with many overlapping tool modes; this confused the vision model. \item \textbf{Latency} -- Vision-based reasoning cycles added seconds per action, making the experience slower than manual design. \item \textbf{Tool Mapping} -- Translating natural language into precise CAD commands requires careful schema design for MCP. \item \textbf{BIMgent Legacy Code} -- Significant portions of the existing BIMgent architecture need rewriting to remove computer vision dependencies. \end{itemize}
\section*{Lessons Learned} \begin{itemize}[leftmargin=2em] \item \textbf{Direct API access beats computer vision for CAD automation}. \item \textbf{Simplicity $>$ Modality Fusion} in time-sensitive, precision-required environments. \item AI for design works best when it’s not ``guessing'' what’s on the screen but operating with full programmatic access. \item An MCP-based integration layer could become the universal bridge for LLM control of CAD tools. \end{itemize}
\section*{Future Directions} We plan to: \begin{itemize}[leftmargin=2em] \item Fully implement the \textbf{Vectorworks MCP server} as a proof of concept. \item Extend the approach to other CAD/BIM tools (AutoCAD, Revit, Rhino). \item Rebuild \textbf{BIMgent} to leverage this simplified architecture, removing unnecessary vision layers. \item Explore a unified intent schema so that \textbf{one LLM agent can work across multiple CAD applications} with zero retraining. \end{itemize}
\end{document}
Built With
- cad
- mcp
- openai
- python
Log in or sign up for Devpost to join the conversation.