Energy-aware AI platform that lets users control model effort to reduce compute waste.
Built for the PatriotHacks 2025 Hackathon | Devpost Submission
- Node.js 18+
- LM Studio installed
- Meta-Llama-3.1-8B-Instruct LLM installed
- Local hardware capable of running the selected models
git clone https://github.com/akrossu/Aurelient.git
cd aurelientnpm run install:all
# (optional) The backend/.env file is avaliable publically
NODE_ENV=deploy
LMSTUDIO_MODEL=lmstudio-community/Meta-Llama-3.1-8B-Instruct
LMSTUDIO_PORT=1234
APP_PORT=4000
npm run start:backend
npm run start:frontend
- Access the Developer console
- Select Meta-Llama-3.1-8B-Instruct Model to load
- Start LM Studio Server
We’ve all been there: you ask an LLM a simple question, and its response is far deeper than you need it to be. Or you ask something nuanced, and it barely scratches the surface. You end up spending more time coaching the model on how to answer than actually getting the answer.
Right now, LLMs decide how much reasoning and computation to spend purely from your prompt phrasing. That guess is often wrong, and expensive. Overthinking a response wastes compute, underthinking one wastes your time.
Seeing how often people have to coach LLMs into giving the right amount of detail made us realize that wasted user effort and wasted compute are usually the same problem. Cloud AI platforms burn a lot of energy answering prompts more deeply than needed, and users lose time trying to control them.
At the same time, new work like Saad-Falcon, et al [1] shows that modern local models are far more efficient than expected, with consumer hardware reaching up to 88.7% single-turn reasoning performance. We saw this as an opportunity for something better.
That’s when Aurelient was born: a platform built around giving users control and using only the compute that a response actually needs.
Our hostable platform allows users to choose how much effort they want to put into a query. Prompts like "Write a simple math solution" should not have to use so much energy. That's why we give the option to locally host one or more AI models and access them anywhere through a web interface. By preprocessing your prompt with a very lightweight tool-use model, we can predict and refine how larger models respond to prompts. This is significant because through preprocessing, we are able to correctly allocate resources and minimize compute overhead, while giving the user more control over how the model responds.
We started with a straightforward React application built around LM Studio's local API. From there, we designed a prompt preprocessor which uses a small Llama instruct model to estimate the prompt's semantic complexity and assign it a complexity and confidence score. We then use this score to give a tailored inference depth recommendation to the user.
The interface provides real-time information and allows the user to easily adjust how much processing power the model uses when responding to the prompt. Once the user submits their prompt, we use the inference depth to fine-tune the responding model's behavior to optimize for energy efficiency without compromising on quality of response.
- Frontend: React, Vite, Tailwind
- Backend / API: Express REST API, LM Studio Server
- Models: Llama 8B instruct model, various larger conversation models
- Tools / Infrastructure: Node.js, local GPU hardware (RTX 4070 super, Apple M3)
This is far more than just a wrapper. Handling model selection, resources, and the mathematical precision needed for large model operations introduced substantial complexity. Building a strong foundation from recent research and public findings took a lot of time.
Three days of project time pushed all of us to our limits. We are proud to have produced a strong, functional prototype of our platform. Because of this, we have developed a clear vision for where this platform can go.
We are a team of computer science student, but no one on the team knows anything about AI. Through reading papers, research, reading documentation, and being able to mess with parameters of our own local models, we gained so much insight and understanding of what goes into how LLMs actually operate.
Aurelient has enormous room to grow. This project time pushed us to implement what was necessary. There are still opportunities, such as dynamic model loading, smarter query routing, and more granular tuning controls for both basic and advanced users. Our team is excited to share this project with other passionate builders and AI enthusiasts everywhere.
- Zachary Merritt: Full Stack Developer, AI Researcher, Build & Release Engineer, UI/UX Designer, Technical Writer
- Ethan Pierce: Full Stack Developer, API Engineer, Prompt Engineer, Technical Program Manager, Engineering Lead
- Ashley Chi: Frontend developer
- Priya Kallat: Backend developer
[1] Saad-Falcon, Jon, et al. “Intelligence per Watt: Measuring Intelligence Efficiency of Local AI.” ArXiv.org, 2025, https://arxiv.org/abs/2511.07885.


