Simple website-based data pipeline that:
- Scrapes public updates from 3 mining company websites
- Uses OpenRouter (only paid API) to extract structured investment fields + short summaries
- Stores raw + structured results in PostgreSQL
- Avoids duplicates
- Provides a small web dashboard + on-demand database summary
- Can run daily scheduled parsing
- Backend: Python + FastAPI + SQLAlchemy
- DB: PostgreSQL
- Scraping: requests + BeautifulSoup (Playwright optional later)
- AI: OpenRouter API (Chat Completions)
- Frontend: React (Vite)
docker compose up -d dbcd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# edit backend/.env and set OPENROUTER_API_KEY
uvicorn app.main:app --reload --port 8000cd frontend
npm install
npm run devOpen the dashboard at http://localhost:5173.
Create backend/.env (or set env vars) with:
OPENROUTER_API_KEY: requiredOPENROUTER_MODEL: optional (default in.env.example)DATABASE_URL: required (default points to docker compose Postgres)ENABLE_SCHEDULER:trueto run daily parsing on backend startupSCHEDULE_CRON: optional cron string (default daily at 06:30)
- Scraping uses basic HTML parsing; if a site blocks requests or requires JS, we can add Playwright later.
- “Detect new” is done by URL uniqueness + content hash; “updated” triggers if the content hash changes.
- LLM extraction is best-effort. Raw text is always stored.