Skip to content

najibna/OR-Royalties-Extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OR Royalties Extractor (MVP)

Simple website-based data pipeline that:

  • Scrapes public updates from 3 mining company websites
  • Uses OpenRouter (only paid API) to extract structured investment fields + short summaries
  • Stores raw + structured results in PostgreSQL
  • Avoids duplicates
  • Provides a small web dashboard + on-demand database summary
  • Can run daily scheduled parsing

Tech

  • Backend: Python + FastAPI + SQLAlchemy
  • DB: PostgreSQL
  • Scraping: requests + BeautifulSoup (Playwright optional later)
  • AI: OpenRouter API (Chat Completions)
  • Frontend: React (Vite)

Quickstart (Docker Postgres + local apps)

1) Start Postgres

docker compose up -d db

2) Backend

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# edit backend/.env and set OPENROUTER_API_KEY
uvicorn app.main:app --reload --port 8000

3) Frontend

cd frontend
npm install
npm run dev

Open the dashboard at http://localhost:5173.


Configuration

Create backend/.env (or set env vars) with:

  • OPENROUTER_API_KEY: required
  • OPENROUTER_MODEL: optional (default in .env.example)
  • DATABASE_URL: required (default points to docker compose Postgres)
  • ENABLE_SCHEDULER: true to run daily parsing on backend startup
  • SCHEDULE_CRON: optional cron string (default daily at 06:30)

Notes / MVP limitations

  • Scraping uses basic HTML parsing; if a site blocks requests or requires JS, we can add Playwright later.
  • “Detect new” is done by URL uniqueness + content hash; “updated” triggers if the content hash changes.
  • LLM extraction is best-effort. Raw text is always stored.

About

OR Royalties Extractor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors