Althea - AI Agent for Domain-Specific and Scientifically-Complex Research Tasks
Inspiration
Imagine a world where the pace of scientific discovery is not constrained by the laborious process of sifting through an ever-growing mountain of research papers. Today’s reality is starkly different. Scientists are often mired in an overwhelming sea of literature, where the quest for relevant information can be as daunting as the research itself. This bottleneck in the scientific process is not just a nuisance; it's a barrier to progress.
The need for a solution is echoed by the Organisation for Economic Co-operation and Development (OECD), which recognizes the potential of AI to revolutionize scientific productivity. Moreover, studies have shown that AI, when made aware of human expertise, can not only accelerate the discovery process but also identify areas that humans might overlook, thus pushing the boundaries of science. The inspiration for Althea was not only grounded in data, but also our frustrations as researchers: we’ve felt firsthand the daunting task of navigating through endless databases, deciphering complex studies, and trying to stitch together disparate pieces of knowledge into a coherent narrative – to even begin doing research.
Althea is our response to this challenge—an AI agent designed to empower researchers by shouldering the burden of literature review. By harnessing the power of AI, we envision Althea as a tool that could not only expedite the review process but also enhance the quality of our insights and findings. By automating the most tedious parts of the literature review, Althea promises to return to us what we value most: time—time to think, to analyze, and to innovate.
(Also: Mira Murati validated the need for Althea during her talk at the TreeHacks Opening Ceremony by mentioning her interest in building a research agent!)
What it does
Althea is an AI-powered Research Agent and Assistant that aims to revolutionize the way researchers and scientists conduct their work. Leveraging the capabilities of Large Language Models (LLMs), Althea assists throughout the research process, from conducting thorough literature reviews and identifying research gaps to idea validation. Our system is fine-tuned on a specialized dataset of biochemistry research papers, offering highly accurate, domain-specific information, and allowing scientists to quickly obtain answers to scientific queries. Our system uniquely integrates a citation network for building a robust knowledge base and employs semantic chunking to facilitate efficient information retrieval from both the citation network and wider internet resources, including Wikipedia.
How we built it
The foundation of Althea is built on advanced LLMs, improved further using a Retrieval-Augmented Generation system to understand and process complex scientific literature.
To demonstrate domain expertise, we crafted a dataset of ~500 biochemistry papers, focusing on Amino Acids. This amounted to over 1 million lines and 12.7 million tokens. We segmented this vast amount of data into 20,000 semantic chunks, enabling Althea to provide precise and relevant information rapidly.
To support this architecture, we utilized a citation network for knowledge base construction and integrated various AI agents and assistants for querying. Our system uses multiple agents from querying the knowledge base of selected scientific literature to freely roaming the internet for up-to-date information and using Wikipedia to get accurate information on specific topics to improve accuracy.
The project was developed using Reflex, a full-stack framework, chosen for its potential to facilitate rapid development and deployment of such an advanced system.
Challenges we ran into
- Fine-tuning model: Fine-tuning our language models to decode the complex jargon of biochemistry literature was a challenge. Enhancing the model's capabilities for expert-level interpretation within a tight timeframe was difficult but set a direction for our future efforts.
- Incorporating Reflex: While Reflex offered promising features for rapid development and deployment, integrating our complex backend logic with its ecosystem presented a steep learning curve. Specifically, wrapping React components was challenging.
Accomplishments that we're proud of
- Build a Complex Retrieval System: Developed a sophisticated retrieval system to enhance understanding and processing of scientific literature without fine-tuning LLMs.
- Crafted a Specialized Dataset: Curated a dataset of ~500 biochemistry papers, totaling over 1 million lines and 12.7 million tokens for domain-specific insights. Efficient Semantic Chunking: Segmented dataset into 20,000 semantic chunks for rapid and accurate information retrieval.
- Citation Network Integration: Employed a citation network to build a robust knowledge base, enhancing research capabilities.
- Diverse AI Agent Use: Integrated multiple AI agents for querying scientific literature and sourcing up-to-date internet and Wikipedia information.
- Reflex Framework Deployment: Successfully used Reflex for rapid development and deployment, showcasing adaptability to new technologies.
- Achieved Domain-specific Accuracy: Ensured highly accurate information retrieval, tailored to biochemistry, improving research efficiency.
What's next for Althea
- Broaden Scientific Horizons: Extend Althea's reach by incorporating datasets from diverse scientific disciplines, making it a universal tool for researchers worldwide.
- Innovative Fine-Tuning Service: Launch a groundbreaking, cost-effective service for custom fine-tuning AI models, tailored to meet the unique needs of individual research projects.
- Enhanced Citation Network: Enhance Althea's citation analysis capabilities, offering researchers deeper insights and fostering connections across a vast array of scientific literature.
- Collaboration and Community Building: Evolve Althea into a collaborative ecosystem where researchers can share data, insights, and discoveries, creating a vibrant, global community of scientific exchange.
- Empower Global Research: Prioritize accessibility and sustainability, aiming to bring Althea's advanced research assistance to underrepresented regions and democratize scientific discovery.

Log in or sign up for Devpost to join the conversation.