AIDR Disaster Event Scraper

A lightweight scaffold for scraping the AIDR disaster events API, normalizing data, and persisting it with a CLI orchestrator.

Quick start

Create a virtual environment and install dependencies:

python -m venv .venv
# Windows PowerShell
.\.venv\Scripts\Activate.ps1
# macOS/Linux
source .venv/bin/activate
pip install -r requirements.txt

Make the src package importable and load environment settings:

# Windows PowerShell
$env:PYTHONPATH = "src"
copy .env.example .env
# macOS/Linux
export PYTHONPATH=src
cp .env.example .env

Run the scraper pipeline (fetch -> normalize -> store):

python -m aidr_scraper.main scrape --start-year 2005 --end-year 2025

Refresh analytics/materialized views and show category counts:

python -m aidr_scraper.main refresh-views
python -m aidr_scraper.main analytics

Preview a few rows as CSV (defaults to stdout, or pass --output to save):

python -m aidr_scraper.main sample-csv --limit 5
python -m aidr_scraper.main sample-csv --limit 10 --output data/sample.csv

Environment variables

DATABASE_URL (optional): SQLAlchemy URL. Defaults to sqlite:///data/aidr.db.
AIDR_API_URL (optional): Override the AIDR resource search endpoint.
AIDR_TIMEOUT (optional): Request timeout in seconds (default 30).

Project layout

src/aidr_scraper/ - package with fetch, normalize, storage, transform, and CLI orchestration.
migrations/ - SQL DDL scripts to bootstrap the database schema.
web_scraper.py - reference script the scaffold was based on.

Notes

The CLI uses Typer for ergonomics and python-dotenv to load .env automatically.
BeautifulSoup is used to safely strip any HTML fragments in summaries returned by the API.
The storage layer uses SQLAlchemy with an idempotent upsert to deduplicate events.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
migrations		migrations
src/aidr_scraper		src/aidr_scraper
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIDR Disaster Event Scraper

Quick start

Environment variables

Project layout

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AIDR Disaster Event Scraper

Quick start

Environment variables

Project layout

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages