Skip to content

topben/astrogroot

Repository files navigation

image

AstroGroot

An automated astronomy research library powered by AI

AstroGroot collects, processes, and indexes astronomy content from multiple sources (arXiv, YouTube, NASA) and makes it searchable through semantic vector search and AI-powered summaries.

✨ Features

  • πŸ“„ arXiv Papers: Automatic collection of astronomy research papers
  • πŸŽ₯ YouTube Videos: Educational astronomy content with transcript extraction
  • πŸš€ NASA Content: APOD (Astronomy Picture of the Day) and NASA Image Library
  • πŸ€– AI Processing: Claude-powered summarization and translation
  • πŸ” Semantic Search: Vector-based search using ChromaDB embeddings
  • 🌐 Web Dashboard: Browse and search your library
  • πŸ”Œ MCP Server: Integration with Claude Desktop via Model Context Protocol

πŸ—οΈ Architecture

β”œβ”€β”€ deno.json                # Project config & dependencies
β”œβ”€β”€ main.tsx                 # Hono app entry (routes, API, static)
β”œβ”€β”€ drizzle.config.ts        # Drizzle ORM configuration
β”œβ”€β”€ docker-compose.yml       # ChromaDB & Redis services
β”œβ”€β”€ .env.example             # Environment variables template
β”‚
β”œβ”€β”€ db/                      # Database Layer (Drizzle + Turso)
β”‚   β”œβ”€β”€ client.ts            # Turso/LibSQL connection
β”‚   └── schema.ts            # Database schema
β”‚
β”œβ”€β”€ lib/                     # Shared Libraries
β”‚   β”œβ”€β”€ vector.ts            # ChromaDB wrapper
β”‚   β”œβ”€β”€ mcp.ts               # MCP request handler (getStats, listMethods, etc.)
β”‚   β”œβ”€β”€ ai/
β”‚   β”‚   β”œβ”€β”€ client.ts        # Anthropic SDK client
β”‚   β”‚   └── processor.ts     # AI summarization & translation
β”‚   └── collectors/
β”‚       β”œβ”€β”€ nasa.ts          # NASA API integration
β”‚       β”œβ”€β”€ arxiv.ts         # arXiv API integration
β”‚       └── youtube.ts       # YouTube transcript extraction
β”‚
β”œβ”€β”€ components/              # Hono JSX UI (server-rendered)
β”‚   β”œβ”€β”€ layout.tsx           # Shared layout (starfield, nav, styles)
β”‚   β”œβ”€β”€ search-bar.tsx       # Search form + filters
β”‚   └── pages/
β”‚       β”œβ”€β”€ dashboard.tsx    # Dashboard (stats, about)
β”‚       β”œβ”€β”€ search.tsx       # Search page
β”‚       └── not-found.tsx    # 404 page
β”‚
β”œβ”€β”€ static/                  # Static assets
β”‚   └── astrogroot-logo.png  # Logo (transparent)
β”‚
└── workers/                 # Background Processing
    └── crawler.ts           # Automated data collection worker

πŸš€ Quick Start

Prerequisites

Installation

  1. Clone the repository
git clone https://github.com/yourusername/astrogroot.git
cd astrogroot
  1. Set up environment variables
cp .env.example .env
# Edit .env with your API keys and credentials

Required environment variables:

# Database (Turso)
TURSO_DATABASE_URL=libsql://your-database.turso.io
TURSO_AUTH_TOKEN=your-turso-auth-token

# AI Processing (Anthropic)
ANTHROPIC_API_KEY=sk-ant-api-key-here

# Vector Store (ChromaDB)
CHROMA_HOST=http://localhost:8000
CHROMA_AUTH_TOKEN=astrogroot-token

# Optional
NASA_API_KEY=DEMO_KEY
YOUTUBE_API_KEY=your-youtube-api-key
  1. Start infrastructure services
docker-compose up -d

This starts:

  • ChromaDB (vector database) on port 8000. The crawler uses a built-in embedding function (no extra dependency). For stronger semantic search, you can install chromadb-default-embed (see ChromaDB docs).
  • Redis (optional, for task queues) on port 6379
  1. Initialize the database
# Generate migrations
deno task db:generate

# Push schema to Turso
deno task db:push
  1. Install dependencies

Deno will automatically install dependencies on first run, but you can pre-cache them:

deno cache --reload deno.json

πŸ“– Usage

Running the Web Server

Start the Hono development server:

deno task dev

Visit the URL shown in the terminal (e.g. http://localhost:8000 or http://localhost:8001) to access the dashboard. If port 8000 is in use (e.g. by ChromaDB when using Docker), the server will try the next available port.

Tip: When running Docker (ChromaDB on 8000), set PORT=8001 in .env so the web app uses 8001 and avoids port conflict.

Verify everything is running:

# Check Docker containers (ChromaDB, Redis)
docker compose ps

# Check web app (replace 8001 with your app port if different)
curl -s http://localhost:8001/api/health
# β†’ {"ok":true,"service":"astrogroot","timestamp":"..."}

The dashboard shows Library Statistics as 0 until you run the crawler to collect data. The app is running if you see the dashboard and /api/health returns ok: true.

Running the Crawler

The crawler collects data from arXiv, YouTube, and NASA sources.

Single run (collect data once):

deno task worker

Scheduled mode (runs every 24 hours):

deno run --allow-all workers/crawler.ts scheduled

Using the Search Interface

  1. Navigate to http://localhost:8000/search
  2. Enter your query (e.g., "black hole formation", "exoplanet detection")
  3. Filter by content type (papers, videos, NASA)
  4. Results are ranked by semantic similarity using vector embeddings

MCP Server Integration

AstroGroot includes an MCP (Model Context Protocol) server for integration with Claude Desktop.

Available MCP methods:

  • getStats - Get library statistics (papers, videos, NASA counts)
  • listMethods - List all available methods
  • search - Search the library (advertised; implementation in progress)

Example MCP request:

curl -X POST http://localhost:8000/api/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "method": "search",
    "params": {
      "query": "gravitational waves",
      "type": "papers",
      "limit": 5
    }
  }'

πŸš€ Deploying to Deno Deploy

The web app is ready for Deno Deploy. On Deploy, the app uses Deno.serve(app.fetch) (no port binding); locally it binds to PORT or 8000.

1. Create a project on Deno Deploy and connect your GitHub repo.

2. Configure the build:

  • Entrypoint: main.tsx
  • Root directory: (leave default, or set if in a subdirectory)

3. Set environment variables in the Deploy dashboard (Project β†’ Settings β†’ Environment Variables):

Variable Required Description
TURSO_DATABASE_URL Yes Turso database URL (e.g. libsql://your-db.turso.io)
TURSO_AUTH_TOKEN Yes Turso auth token
CHROMA_HOST For search ChromaDB URL (e.g. a remote Chroma instance). If unset, search may fail.
CHROMA_AUTH_TOKEN Optional If your Chroma server uses auth
NASA_API_KEY Optional NASA API key (defaults to DEMO_KEY)
ANTHROPIC_API_KEY Optional Only if you add AI features that call Claude from the server

4. Deploy. The dashboard, search page, and API routes will be served. The crawler/worker does not run on Deploy (serverless); run it elsewhere (e.g. cron + deno task worker) to populate the database and Chroma.

5. Optional: Use a remote ChromaDB (e.g. Chroma Cloud or a VPS) and set CHROMA_HOST so /api/search works on Deploy.

πŸš€ Deploying to Fly.io (ChromaDB + Crawler)

For a complete production setup, deploy ChromaDB and the Crawler to Fly.io. This complements Deno Deploy (web app) with persistent vector storage and scheduled data collection.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Deno Deploy    │────▢│     Turso       │◀────│    Fly.io       β”‚
β”‚  (Web App)      β”‚     β”‚  (SQLite Edge)  β”‚     β”‚                 β”‚
β”‚                 β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚                 β”‚                             β”‚  β”‚ ChromaDB  β”‚  β”‚
β”‚                 │────────────────────────────▢│  β”‚ (vectors) β”‚  β”‚
β”‚                 β”‚                             β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                             β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
                                                β”‚  β”‚ Crawler   β”‚  β”‚
                                                β”‚  β”‚ (worker)  β”‚  β”‚
                                                β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Prerequisites

  1. Install the Fly CLI
  2. Sign up / log in: fly auth login

Step 1: Deploy ChromaDB

# Create the ChromaDB app with persistent volume
fly launch --config fly.chromadb.toml --no-deploy

# Create persistent volume for vector data (10GB)
fly volumes create chromadb_data --size 10 --config fly.chromadb.toml

# Set authentication token (use a secure random string)
fly secrets set CHROMA_SERVER_AUTH_CREDENTIALS=your-secure-token-here --config fly.chromadb.toml

# Deploy
fly deploy --config fly.chromadb.toml

Note your ChromaDB URL: https://astrogroot-chromadb.fly.dev

Step 2: Deploy the Crawler

# Create the crawler app
fly launch --config fly.toml --no-deploy

# Set environment secrets
fly secrets set \
  TURSO_DATABASE_URL=libsql://your-db.turso.io \
  TURSO_AUTH_TOKEN=your-turso-token \
  ANTHROPIC_API_KEY=sk-ant-your-key \
  CHROMA_HOST=https://astrogroot-chromadb.fly.dev \
  CHROMA_AUTH_TOKEN=your-secure-token-here \
  NASA_API_KEY=your-nasa-key \
  --config fly.toml

# Deploy
fly deploy --config fly.toml

Step 3: Update Deno Deploy

In your Deno Deploy dashboard, set:

  • CHROMA_HOST=https://astrogroot-chromadb.fly.dev
  • CHROMA_AUTH_TOKEN=your-secure-token-here

Monitoring

# View crawler logs
fly logs --config fly.toml

# View ChromaDB logs
fly logs --config fly.chromadb.toml

# SSH into crawler for debugging
fly ssh console --config fly.toml

Cost Estimate

Service Specs Cost
ChromaDB 1 shared CPU, 1GB RAM, 10GB disk ~$5-7/mo
Crawler 1 shared CPU, 512MB RAM ~$3-5/mo
Total ~$8-12/mo

πŸ§ͺ Development

Database Management

# Generate new migrations
deno task db:generate

# Push schema changes
deno task db:push

# Open Drizzle Studio (database GUI)
deno task db:studio

Project Structure

  • Database Layer: Drizzle ORM with Turso (LibSQL)
  • Vector Store: ChromaDB for semantic search
  • AI Processing: Anthropic Claude for summarization
  • Web Framework: Hono with server-side JSX (Deno)
  • Background Workers: Deno native with scheduled execution

Adding New Data Sources

  1. Create a new collector in lib/collectors/
  2. Define the data schema in db/schema.ts
  3. Update the crawler in workers/crawler.ts
  4. Add vector storage in the appropriate collection

πŸ”§ Configuration

Crawler Settings

Adjust crawler behavior via environment variables:

CRAWLER_INTERVAL_HOURS=24      # How often to run (default: 24)
MAX_ITEMS_PER_SOURCE=50        # Max items per source per run (default: 50)

arXiv Categories

The crawler collects from these astronomy categories by default:

  • astro-ph.CO - Cosmology and Nongalactic Astrophysics
  • astro-ph.EP - Earth and Planetary Astrophysics
  • astro-ph.GA - Astrophysics of Galaxies
  • astro-ph.HE - High Energy Astrophysical Phenomena
  • astro-ph.IM - Instrumentation and Methods
  • astro-ph.SR - Solar and Stellar Astrophysics
  • gr-qc - General Relativity and Quantum Cosmology
  • physics.space-ph - Space Physics

Modify in lib/collectors/arxiv.ts.

πŸ“Š Data Flow

  1. Collection: Crawler fetches data from arXiv, YouTube, NASA
  2. Processing: Claude AI generates summaries and extracts key points
  3. Storage: Data saved to Turso database
  4. Indexing: Embeddings stored in ChromaDB for semantic search
  5. Query: Users search via web UI or MCP server
  6. Retrieval: Vector search finds relevant content

🌟 Use Cases

  • Research: Quickly find relevant astronomy papers and summaries
  • Education: Discover educational videos on specific topics
  • Exploration: Browse NASA imagery and explanations
  • Integration: Use MCP server to query from Claude Desktop
  • Personal Library: Build a curated astronomy knowledge base

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Guidelines

  1. Follow the existing code structure
  2. Add tests for new features
  3. Update documentation
  4. Use Deno's built-in formatter: deno fmt
  5. Use Deno's linter: deno lint

πŸ“ License

MIT License - see LICENSE file for details

πŸ™ Acknowledgments

  • arXiv for open access to research papers
  • NASA for public APIs and imagery
  • Anthropic for Claude AI
  • ChromaDB for vector database
  • Turso for serverless SQLite
  • Deno for the modern JavaScript runtime

πŸ“ž Support

For issues, questions, or contributions:

  • Create an issue on GitHub
  • Join our discussions
  • Check the documentation

Built with ❀️ using Deno, Hono, Claude AI, and open astronomy data

About

open source project for Rocket

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages