An agentic approach to document search. Ask questions about your files and let the agent figure out what to read.
- You ask a question
- The agent lists files, reads relevant ones, searches semantically
- It loops until it has enough context
- It synthesizes an answer
git clone https://github.com/thirtyninetythree/soma-agent
cd soma-agent
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtCopy .env.example to .env and add your API key:
cp .env.example .envpython main.py --directory /path/to/your/docsOptions:
--provider- claude (default), gemini, openai--directory- folder to index (default: current folder)
The agent has three tools:
list_files- list directory contentsread_file- read file contentssemantic_search- search indexed documents by meaning
.txt, .md, .py, .js, .ts, .json, .csv, .html, .css, .yml, .yaml
On startup, the agent:
- Walks the directory for supported files
- Chunks each file (500 chars, 100 overlap)
- Embeds chunks using
all-MiniLM-L6-v2(~90MB model) - Saves vectors to
.rag_index/
Changed files are re-indexed automatically (hash comparison).
MIT