AstroGroot
An automated astronomy research library powered by AI
AstroGroot collects, processes, and indexes astronomy content from multiple sources (arXiv, YouTube, NASA) and makes it searchable through semantic vector search and AI-powered summaries.
- π arXiv Papers: Automatic collection of astronomy research papers
- π₯ YouTube Videos: Educational astronomy content with transcript extraction
- π NASA Content: APOD (Astronomy Picture of the Day) and NASA Image Library
- π€ AI Processing: Claude-powered summarization and translation
- π Semantic Search: Vector-based search using ChromaDB embeddings
- π Web Dashboard: Browse and search your library
- π MCP Server: Integration with Claude Desktop via Model Context Protocol
βββ deno.json # Project config & dependencies
βββ main.tsx # Hono app entry (routes, API, static)
βββ drizzle.config.ts # Drizzle ORM configuration
βββ docker-compose.yml # ChromaDB & Redis services
βββ .env.example # Environment variables template
β
βββ db/ # Database Layer (Drizzle + Turso)
β βββ client.ts # Turso/LibSQL connection
β βββ schema.ts # Database schema
β
βββ lib/ # Shared Libraries
β βββ vector.ts # ChromaDB wrapper
β βββ mcp.ts # MCP request handler (getStats, listMethods, etc.)
β βββ ai/
β β βββ client.ts # Anthropic SDK client
β β βββ processor.ts # AI summarization & translation
β βββ collectors/
β βββ nasa.ts # NASA API integration
β βββ arxiv.ts # arXiv API integration
β βββ youtube.ts # YouTube transcript extraction
β
βββ components/ # Hono JSX UI (server-rendered)
β βββ layout.tsx # Shared layout (starfield, nav, styles)
β βββ search-bar.tsx # Search form + filters
β βββ pages/
β βββ dashboard.tsx # Dashboard (stats, about)
β βββ search.tsx # Search page
β βββ not-found.tsx # 404 page
β
βββ static/ # Static assets
β βββ astrogroot-logo.png # Logo (transparent)
β
βββ workers/ # Background Processing
βββ crawler.ts # Automated data collection worker
- Deno 2.0 or higher
- Docker and Docker Compose
- Turso database account
- Anthropic API key
- (Optional) YouTube Data API key
- Clone the repository
git clone https://github.com/yourusername/astrogroot.git
cd astrogroot- Set up environment variables
cp .env.example .env
# Edit .env with your API keys and credentialsRequired environment variables:
# Database (Turso)
TURSO_DATABASE_URL=libsql://your-database.turso.io
TURSO_AUTH_TOKEN=your-turso-auth-token
# AI Processing (Anthropic)
ANTHROPIC_API_KEY=sk-ant-api-key-here
# Vector Store (ChromaDB)
CHROMA_HOST=http://localhost:8000
CHROMA_AUTH_TOKEN=astrogroot-token
# Optional
NASA_API_KEY=DEMO_KEY
YOUTUBE_API_KEY=your-youtube-api-key- Start infrastructure services
docker-compose up -dThis starts:
- ChromaDB (vector database) on port 8000. The crawler uses a built-in embedding function (no extra dependency). For stronger semantic search, you can install
chromadb-default-embed(see ChromaDB docs). - Redis (optional, for task queues) on port 6379
- Initialize the database
# Generate migrations
deno task db:generate
# Push schema to Turso
deno task db:push- Install dependencies
Deno will automatically install dependencies on first run, but you can pre-cache them:
deno cache --reload deno.jsonStart the Hono development server:
deno task devVisit the URL shown in the terminal (e.g. http://localhost:8000 or http://localhost:8001) to access the dashboard. If port 8000 is in use (e.g. by ChromaDB when using Docker), the server will try the next available port.
Tip: When running Docker (ChromaDB on 8000), set PORT=8001 in .env so the web app uses 8001 and avoids port conflict.
Verify everything is running:
# Check Docker containers (ChromaDB, Redis)
docker compose ps
# Check web app (replace 8001 with your app port if different)
curl -s http://localhost:8001/api/health
# β {"ok":true,"service":"astrogroot","timestamp":"..."}The dashboard shows Library Statistics as 0 until you run the crawler to collect data. The app is running if you see the dashboard and /api/health returns ok: true.
The crawler collects data from arXiv, YouTube, and NASA sources.
Single run (collect data once):
deno task workerScheduled mode (runs every 24 hours):
deno run --allow-all workers/crawler.ts scheduled- Navigate to http://localhost:8000/search
- Enter your query (e.g., "black hole formation", "exoplanet detection")
- Filter by content type (papers, videos, NASA)
- Results are ranked by semantic similarity using vector embeddings
AstroGroot includes an MCP (Model Context Protocol) server for integration with Claude Desktop.
Available MCP methods:
getStats- Get library statistics (papers, videos, NASA counts)listMethods- List all available methodssearch- Search the library (advertised; implementation in progress)
Example MCP request:
curl -X POST http://localhost:8000/api/mcp \
-H "Content-Type: application/json" \
-d '{
"method": "search",
"params": {
"query": "gravitational waves",
"type": "papers",
"limit": 5
}
}'The web app is ready for Deno Deploy. On Deploy, the app uses Deno.serve(app.fetch) (no port binding); locally it binds to PORT or 8000.
1. Create a project on Deno Deploy and connect your GitHub repo.
2. Configure the build:
- Entrypoint:
main.tsx - Root directory: (leave default, or set if in a subdirectory)
3. Set environment variables in the Deploy dashboard (Project β Settings β Environment Variables):
| Variable | Required | Description |
|---|---|---|
TURSO_DATABASE_URL |
Yes | Turso database URL (e.g. libsql://your-db.turso.io) |
TURSO_AUTH_TOKEN |
Yes | Turso auth token |
CHROMA_HOST |
For search | ChromaDB URL (e.g. a remote Chroma instance). If unset, search may fail. |
CHROMA_AUTH_TOKEN |
Optional | If your Chroma server uses auth |
NASA_API_KEY |
Optional | NASA API key (defaults to DEMO_KEY) |
ANTHROPIC_API_KEY |
Optional | Only if you add AI features that call Claude from the server |
4. Deploy. The dashboard, search page, and API routes will be served. The crawler/worker does not run on Deploy (serverless); run it elsewhere (e.g. cron + deno task worker) to populate the database and Chroma.
5. Optional: Use a remote ChromaDB (e.g. Chroma Cloud or a VPS) and set CHROMA_HOST so /api/search works on Deploy.
For a complete production setup, deploy ChromaDB and the Crawler to Fly.io. This complements Deno Deploy (web app) with persistent vector storage and scheduled data collection.
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Deno Deploy ββββββΆβ Turso βββββββ Fly.io β
β (Web App) β β (SQLite Edge) β β β
β β βββββββββββββββββββ β βββββββββββββ β
β β β β ChromaDB β β
β ββββββββββββββββββββββββββββββΆβ β (vectors) β β
β β β βββββββββββββ β
βββββββββββββββββββ β βββββββββββββ β
β β Crawler β β
β β (worker) β β
β βββββββββββββ β
βββββββββββββββββββ
- Install the Fly CLI
- Sign up / log in:
fly auth login
# Create the ChromaDB app with persistent volume
fly launch --config fly.chromadb.toml --no-deploy
# Create persistent volume for vector data (10GB)
fly volumes create chromadb_data --size 10 --config fly.chromadb.toml
# Set authentication token (use a secure random string)
fly secrets set CHROMA_SERVER_AUTH_CREDENTIALS=your-secure-token-here --config fly.chromadb.toml
# Deploy
fly deploy --config fly.chromadb.tomlNote your ChromaDB URL: https://astrogroot-chromadb.fly.dev
# Create the crawler app
fly launch --config fly.toml --no-deploy
# Set environment secrets
fly secrets set \
TURSO_DATABASE_URL=libsql://your-db.turso.io \
TURSO_AUTH_TOKEN=your-turso-token \
ANTHROPIC_API_KEY=sk-ant-your-key \
CHROMA_HOST=https://astrogroot-chromadb.fly.dev \
CHROMA_AUTH_TOKEN=your-secure-token-here \
NASA_API_KEY=your-nasa-key \
--config fly.toml
# Deploy
fly deploy --config fly.tomlIn your Deno Deploy dashboard, set:
CHROMA_HOST=https://astrogroot-chromadb.fly.devCHROMA_AUTH_TOKEN=your-secure-token-here
# View crawler logs
fly logs --config fly.toml
# View ChromaDB logs
fly logs --config fly.chromadb.toml
# SSH into crawler for debugging
fly ssh console --config fly.toml| Service | Specs | Cost |
|---|---|---|
| ChromaDB | 1 shared CPU, 1GB RAM, 10GB disk | ~$5-7/mo |
| Crawler | 1 shared CPU, 512MB RAM | ~$3-5/mo |
| Total | ~$8-12/mo |
# Generate new migrations
deno task db:generate
# Push schema changes
deno task db:push
# Open Drizzle Studio (database GUI)
deno task db:studio- Database Layer: Drizzle ORM with Turso (LibSQL)
- Vector Store: ChromaDB for semantic search
- AI Processing: Anthropic Claude for summarization
- Web Framework: Hono with server-side JSX (Deno)
- Background Workers: Deno native with scheduled execution
- Create a new collector in
lib/collectors/ - Define the data schema in
db/schema.ts - Update the crawler in
workers/crawler.ts - Add vector storage in the appropriate collection
Adjust crawler behavior via environment variables:
CRAWLER_INTERVAL_HOURS=24 # How often to run (default: 24)
MAX_ITEMS_PER_SOURCE=50 # Max items per source per run (default: 50)The crawler collects from these astronomy categories by default:
astro-ph.CO- Cosmology and Nongalactic Astrophysicsastro-ph.EP- Earth and Planetary Astrophysicsastro-ph.GA- Astrophysics of Galaxiesastro-ph.HE- High Energy Astrophysical Phenomenaastro-ph.IM- Instrumentation and Methodsastro-ph.SR- Solar and Stellar Astrophysicsgr-qc- General Relativity and Quantum Cosmologyphysics.space-ph- Space Physics
Modify in lib/collectors/arxiv.ts.
- Collection: Crawler fetches data from arXiv, YouTube, NASA
- Processing: Claude AI generates summaries and extracts key points
- Storage: Data saved to Turso database
- Indexing: Embeddings stored in ChromaDB for semantic search
- Query: Users search via web UI or MCP server
- Retrieval: Vector search finds relevant content
- Research: Quickly find relevant astronomy papers and summaries
- Education: Discover educational videos on specific topics
- Exploration: Browse NASA imagery and explanations
- Integration: Use MCP server to query from Claude Desktop
- Personal Library: Build a curated astronomy knowledge base
Contributions are welcome! Please feel free to submit a Pull Request.
- Follow the existing code structure
- Add tests for new features
- Update documentation
- Use Deno's built-in formatter:
deno fmt - Use Deno's linter:
deno lint
MIT License - see LICENSE file for details
- arXiv for open access to research papers
- NASA for public APIs and imagery
- Anthropic for Claude AI
- ChromaDB for vector database
- Turso for serverless SQLite
- Deno for the modern JavaScript runtime
For issues, questions, or contributions:
- Create an issue on GitHub
- Join our discussions
- Check the documentation
Built with β€οΈ using Deno, Hono, Claude AI, and open astronomy data
