Archibald.ai

Inspiration

More than 2.6M students take AP courses each year (Forbes). AP courses are widely considered to be highly challenging due to the significant amount of information and concepts included in them. In our experience taking many AP courses, targeted tutoring and easily digestible information make all the difference between succeeding and struggling.

However, many people do not have access to such tutoring. Our AP World History teachers weren't able to grade more than two LEQs (long essay questions) with proper feedback because LEQs take so long to grade for ~150 students. Additionally, due to the large amount covered in AP classes, it ends up losing its structure-- and it is difficult to learn without proper structure.

What it does

Chat with your AP courses: We currently offer 6 AP courses that you can chat with. Trained on tens of thousands of pages from textbooks and curated websites, you get a citation and a deep explanation for every question. Our math model seamlessly accesses Wolfram Alpha, meaning even the toughest Calculus problems are easy for Archibald. Struggling with an integral? Ask Archibald. Don't want to make a SPICE chart for the Incan civilization? Ask Archibald. Can't understand as CSA concept? Archibald can explain it simply.

Get Grades on your LEQs and SAQs instantly: Most teachers simply don't have the time to grade their hundreds of students' LEQs and SAQs with comprehensive feedback and suggestions. This disadvantages many students when it comes to the written portion of the History APs. We provide:

Constructive line-by-line feedback with kudos and improvement suggestions
An unbelievably accurate LEQ/SAQ number grade (seriously, you have to see it to believe it)
What points you earned/lost on the rubric including an explanation for why

Definitions: Hover over terms that the model responds with to see a definition. We have over 15 thousand AP terms with definitions scraped.

Automatic MCQ generator: For our AP Biology and AP Computer Science A courses, we have an MCQ generator: you tell it what topic you need an MCQ on, and it will generate a question with four choices. After you choose one, it informs you if you were correct or not, and gives a complete explanation for the correct answer.

How we built it

Front-end:

React + Tailwind CSS + Framer motion for some animation. Some components are from the Aceternity library, but most are completely custom.
GIGAMD our custom, completely hand-written Markdown parser & renderer built specially for this hackathon.

Back-end: Next JS + Supabase Postgres & Auth.

AI:

Google Cloud Vertex AI for access to the Claude 3.5 Sonnet LLM
Langchain.js (our custom patched version including a bug fix and major feature addition)
Supabase PGVector as our vector store for RAG
Wolfram Alpha API for our AP Calculus models

Scraper:

We wrote custom web scrapers to collect information on/about AP courses
Our custom scrapers have scraped thousands of pages worth of information from the internet
Additionally, we have over 15 thousand AP terms with definitions scraped

Challenges we ran into

Budget: we wanted to make this for $0. This was not easy
Email Server: We send users a confirmation e-mail when they sign up. Finding an SMTP email server that would allow us to do this for free was difficult
Because of our "budget constraints", we were forced to use Google Cloud Vertex AI because it provided us with a free trial of $150 cloud credits. However, the LLM framework we chose, Langchain.js did not support non-gemini models (aka did not support the good models) hosted in Vertex AI. This turns out to be a highly requested feature, and we ended up patching Langchain to support Claude 3.5 Sonnet in the Vertex AI. The community has since evolved our patch into a pull request.
We also ended up writing a patch to a known but unsolved Langchain bug that we encountered. A pull request will be created after this hackathon.
Our MCQ generation model is still a bit rough. It was rushed, and hence its CSA problem generator lacks markdown formatting (we ran into unforeseen issues at the last minute implementing this). However, it still works quite well, and is very fun to play wiht.
GIGAMD, our custom markdown parser and renderer was born out of frustrations with the inflexibility of existing markdown renderers. We implemented it as a one-pass recursive descent parser, while we later learned that markdown parsers are supposed to be two-pass :( This resulted in a _ very _ difficult time, and we ended up having to make some concessions when it comes to formatting lists (so 95% of responses from our LLM are perfectly formatted, but lists nested in bullet points... don't render properly)
Streaming: You would be surprised how difficult streaming the output from the AI model to the front-end in real-time is. It led to all sorts of nasty and unreplicable bugs.

Accomplishments that we're proud of

Building a complete prototype that works extremely well
Beautiful UI with animations
Indexing tens of thousands of pages from textbooks through our RAG pipeline
Scraping thousands of web pages
Custom markdown parser
Langchain bug & feature patches

What we learned

Some of our team members had never worked with Next.js before
Most team members were unfamiliar with RAG pipelines and how these models are supposed to work
We learned to not use Langchain again. It is very popular and used often in the industry, but it is over-abstracted and inflexible. There are abstractions on top of abstractions which led us to spend more time debugging and patching langchain than debugging our own models (metaphorically)
Parse markdown in two passes, not one

What's next for Archibald.ai

We are going to continue to work on this project!

Built With

gigamd
langchain
llm
next
postgresql
react
supabase
wolfram-technologies

Submitted to

GIA Hacks 2
- Winner Top 5
- Winner Wolfram | One
- Winner PM Interview Course
- Winner PM Resume Course
- Winner Axure RP Team Edition
- Winner 1Password Families
- Winner .xyz domains
- Winner Art of Problem Solving Coupons
- Winner Interview Cake
- Winner Next Step
- Winner Chess.com Diamond Membership

Created by

I worked mainly on designing our website UI with Next.js and TailwindCSS to make it as interactive and dynamic as possible. I also helped integrate the backend by making sure messages and sources were saved correctly.

Lucas Chen
Led the development of the AI models. Created GIGAMD and implemented streaming. Wrote the Langchain patches. Also created the UI for the chat window.

Aditya Kumar
I worked mainly on the back-end, where I set up user authentication, google oath authentication, chat history, and a custom smtp server. I also patched a library originally used to display latex math due to a missing export in its code. Aditionally, I scraped hundreds of thousands of lines of data and over 15 thousand vocabulary terms for our model.

Ali Macky

Updates

Lucas Chen started this project — Aug 23, 2024 11:51 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.