Automatically scrapes recipes from ~20 cooking websites each week and emails a curated, balanced dinner menu to your household.
What it does: once a week (typically via cron), Recipe Emailer scrapes fresh recipes from a list of cooking sites, picks a balanced set of meals β by default two land-protein mains, one seafood main, plus vegetable sides where a meal needs one β and emails the menu (with images, ingredients, and instructions) to your recipients. It remembers what it has already sent so meals don't repeat, and it nudges each week's picks toward seasonally-appropriate recipes (light/no-cook dishes in summer, hearty/oven dishes in winter) using a small local model that needs no network or API keys at run time. It's built to run unattended on low-powered hardware like a Raspberry Pi: failures are logged and skipped rather than crashing the weekly run.
- π± Seasonal AI selection β each recipe is scored for seasonal fit by a
distilled, pure-numpy model that ships in the repo (
seasonal_model.json). Inference is instant with no Ollama, network, or GPU at run time; picks are biased toward in-season recipes and toward oven use in winter / oven-light in summer. Applies to both mains and sides. (Replaces an earlier ~60s/recipe runtime-LLM approach.) - π©Ί Site-health monitoring β when a site's scraper regex silently breaks or a
site is unreachable, the maintainer gets an email instead of a quietly
shrinking menu. Tracks an 8-run rolling window in
site_health.json. - π§ Streaming scrape β recipes are scraped one page at a time instead of buffering every page in memory, fixing out-of-memory crashes on the Pi.
- π€ Encoding fix β scraped pages are decoded by their declared/sniffed
charset, eliminating UTF-8 mojibake (e.g.
5Γ’β¬"6now renders as5β6). - π§° Operational hardening β location-independent, venv-aware
cook.sh; self-capping logs (cronjob.logtrimmed,recipe_emailer.logrotated to the last 8 runs);numpypinned as a runtime dependency.
- β Type-hinted throughout - mypy type-checked in CI
- β Production logging - structured, leveled logging throughout
- β
Custom exceptions - no
sys.exit()in business logic - β Comprehensive testing - 159 tests, ~77% coverage
- β Modern tooling - black, ruff, mypy, pytest configured
# Clone the repository
git clone https://github.com/wassupluke/recipe-emailer.git
cd recipe-emailer
# Install dependencies
pip install -r requirements.txt
# Or install in development mode with dev tools
pip install -e ".[dev]"Create a .env file with your credentials:
SENDER=your-email@gmail.com
PASSWD=your-google-app-password
BCC=recipient1@example.com,recipient2@example.comNote: For Gmail, you need an App Password, not your regular password.
# Normal mode - sends emails to configured recipients
python main.py
# Debug mode - sends only to sender, selects single website
python main.py -d
# or
python main.py --debugcook.sh activates the virtualenv and runs the emailer, appending output to
cronjob.log. It resolves its own location, so it works no matter where you
cloned the repo. Make it executable, then add a crontab entry:
chmod +x cook.sh
crontab -e# Run the Recipe Emailer at 8a on Fridays
0 8 * * Fri /home/wassu/code/recipe-emailer/cook.shAdjust the path to wherever you cloned the repo.
Each run writes a self-contained index.html (the same content as the email).
cook.sh commits + pushes it to a gh-pages branch of this repo, so your
weekly menu is served at https://<user>.github.io/<repo>/ β without ever adding
generated pages to your main history.
One-time setup:
# Create an empty gh-pages branch with a first index.html
git switch --orphan gh-pages
git commit --allow-empty -m "init gh-pages"
git push -u origin gh-pages
git switch mainThen enable Pages in the repo settings (Settings β Pages β Build from branch β
gh-pages / root). After that, every cook.sh run publishes automatically.
index.html is gitignored on main and pushed via a throwaway git worktree, so
the publish step never touches your working tree or the recipe JSON files. To
disable publishing, delete the marked block in cook.sh β the rest of the run
(scrape, email) is unaffected.
- Multi-site scraping: Scrapes 20 recipe websites automatically
- Smart selection: Balances protein types (seafood vs. land-based)
- Veggie checking: Ensures meals have adequate vegetables, adds sides if needed
- Seasonal bias: A distilled local model nudges picks toward in-season recipes
- Site-health monitoring: Emails the maintainer when a scraper's regex breaks
- Deduplication: Tracks used recipes, avoids repeats
- Error resilience: Continues on failures, tracks problematic URLs
- Type safety: Type hints throughout, mypy type-checked in CI
- Error handling: Custom exceptions, detailed error messages
- Logging: Structured, self-capping file logs + console output
- Testing: Comprehensive test suite (159 tests)
- Documentation: Docstring coverage enforced at β₯95% (interrogate)
Flat layout β every module is a top-level file at the repo root:
main.py Entry point and pipeline orchestration
config.py Configuration and constants
file_utils.py JSON load/save (the recipe "database")
websites.py Per-site scrape configs (regex + index URLs)
web_scraper.py HTTP fetch + HTML -> recipe parsing
recipe_processor.py Streaming batch scrape across sites
recipe_selector.py Protein selection + veggie/side checking
seasonal_tagging.py Per-recipe oven-use + seasonality tags
seasonal_model.py Pure-numpy seasonal "student" inference
seasonal_selection.py Season/heat-weighted recipe selection
seasonal_label.py Teacher labeling for training (desktop/GPU)
train_seasonal_model.py Train + export seasonal_model.json (desktop)
backfill_seasonality.py One-off tagger for the existing backlog
site_health.py Scraper regex-failure / reachability monitoring
html_generator.py Email HTML generation
email_sender.py SMTP email delivery
website_publisher.py Write the standalone index.html page for publishing
debug_utils.py Debug-mode utilities
pyproject.toml Project + tooling configuration
tests/ Test suite
- Load config (
config.py) β env vars + constants. - Load state (
file_utils.py) β read the tracking JSON files; check debug mode. - Scrape (
recipe_processor.pyβweb_scraper.py) β stream recipes from each site one page at a time; record regex/reachability problems (site_health.py). - Tag (
seasonal_tagging.py/seasonal_model.py) β add oven-use + seasonality scores to any untagged recipes (instant, pure-numpy). - Select (
recipe_selector.py+seasonal_selection.py) β balance proteins, ensure veggies/sides, bias toward in-season picks. - Render (
html_generator.py) β build the email HTML. - Send (
email_sender.py) β SMTP delivery. - Write page (
website_publisher.py) β write the standaloneindex.html;cook.shthen commits + pushes it to thegh-pagesbranch. - Persist (
file_utils.py) β move sent recipes to used, save tracking JSON.
# Run all tests with coverage
pytest
# Run specific test file
pytest tests/test_file_utils.py -v
# Run with detailed coverage report
pytest --cov --cov-report=html# View HTML coverage report
open htmlcov/index.htmlCurrent coverage: ~77% across 159 tests.
Target: 95%+ (all modules)
# Type checking
mypy .
# Formatting
black .
# Linting
ruff check .
# Auto-fix linting issues
ruff check . --fix
# Run all checks
mypy . && ruff check . && black --check .- All tests passing:
pytest - Type checks clean:
mypy . - Linting clean:
ruff check . - Formatted:
black . - Documentation updated
- Changelog updated
| Operation | Time | Notes |
|---|---|---|
| Startup | ~0.09s | Load config, imports |
| URL extraction | ~8s | ~20 sites |
| Recipe parsing | ~142s | network I/O dominated |
| Seasonal tagging | <1s | pure-numpy, cached model load |
| Email generation | ~0.03s | HTML formatting |
| Total | ~150s | Average full run (scrape-bound) |
Subsequent runs within FILE_AGE_THRESHOLD reuse the cached recipe pool and skip
scraping entirely, completing in a few seconds.
Set in .env (loaded via python-dotenv):
| Variable | Required | Description |
|---|---|---|
SENDER |
β | Gmail address for sending |
PASSWD |
β | Gmail app password |
BCC |
β | Comma-separated recipients |
All in config.py. Defaults shown.
Scraping
FILE_AGE_THRESHOLD(12): hours before the recipe pool is re-scraped.NORMAL_TIMEOUT/DEBUG_TIMEOUT(9 / 20): per-request HTTP timeout, seconds.SCRAPE_FLUSH_INTERVAL(100): how often (in URLs) the streaming scrape flushes progress to disk.
Meal selection
LANDFOOD_COUNT_WITH_SEAFOOD(2): land mains to send when seafood is available.SEAFOOD_COUNT(1): seafood mains to send when available.LANDFOOD_COUNT_NO_SEAFOOD(3): total land mains when no seafood is available.SEAFOOD_PROTEINS/LANDFOOD_PROTEINS: ingredient keywords that classify a recipe's protein type.VEGGIES: vegetable keywords; a main without one gets a side dish added.
Seasonal scoring (see Seasonal AI Selection)
HEAT_WEIGHT(0.5): how strongly winter-oven / summer-no-oven tilts the score.SELECTION_SHARPNESS(3.0): exponent on selection weights; higher = stronger bias toward in-season recipes (a 0.8 recipe becomes ~50Γ likelier than a 0.2).MIN_SCORE(0.01): floor so weighted-random never sees a zero weight.SPRING/SUMMER/FALL/WINTER_CENTER: day-of-year season centers used to blend today's date into per-season weights.SEASONAL_MODEL_FILENAME/SEASONAL_LABELS_FILENAME: the committed student model artifact and the teacher labels used to train it.OLLAMA_HOST/SEASONAL_MODEL/OLLAMA_TIMEOUT: teacher/training only β the Ollama endpoint, teacher model, and request timeout used byseasonal_label.py. Unused on the host at run time.
SUBJECT("Weekly Meals") /HEALTH_SUBJECT: email subject lines.SMTP_SERVER/SMTP_PORT: Gmail SMTP (SSL) defaults.
Currently scrapes 20 recipe websites, including:
- Recipe Runner
- Paleo Running Momma
- Skinny Taste
- Two Peas and Their Pod
- Well Plated
- The Spruce Eats
- Eating Bird Food
- Budget Bytes
- Minimalist Baker
- Pinch of Yum
- Love and Lemons
- (and more - see
websites.py)
Meal selection is biased toward seasonally-appropriate recipes and toward oven
use in winter / grilling in summer. Each recipe gets four per-season scores from
a small distilled "student" model β a TF-IDF + ridge regression trained
offline and shipped in the repo as seasonal_model.json. Inference is pure
numpy: no network, no Ollama, and no GPU at runtime. (numpy is the only added
runtime dependency and is installed by pip install -r requirements.txt.)
There is no setup on the host (e.g. the Raspberry Pi) β the model file is
committed, so a normal python main.py run scores any newly-scraped recipes
inline and instantly. If the model file is missing or a recipe has no usable
text, scoring falls back to a neutral 0.5 per season and nothing breaks.
The student is distilled from a local "teacher" LLM via Ollama.
You only need this to refresh the model on newly-scraped recipes; the Pi never
runs the teacher. Do it on a machine with a GPU + Ollama, with the dev extras
installed (pip install -e ".[dev]", which brings in scikit-learn):
ollama pull llama3.1:8b
# 1. label the recipe corpus with the teacher (resumable):
SEASONAL_MODEL=llama3.1:8b python seasonal_label.py # -> seasonal_labels.json
# 2. train + export the numpy student, then commit it:
python train_seasonal_model.py # -> seasonal_model.json
git add seasonal_model.json seasonal_labels.json && git commit
Teacher-only env vars: OLLAMA_HOST (default http://localhost:11434) and
SEASONAL_MODEL (the teacher model, e.g. llama3.1:8b). These affect labeling
only and have no effect on the Pi at runtime.
Problem: ModuleNotFoundError: No module named 'recipe_scrapers'
Solution:
pip install -r requirements.txtProblem: ValueError: EMAIL_SENDER not configured
Solution: Create .env file with required variables (see Configuration above)
Problem: Gmail authentication fails
Solution: Use an App Password, not your regular Gmail password
Problem: No recipes found
Solution: Check that recipe files aren't too old (>12 hours). Delete JSON files to force refresh.
Problem: Meals don't look seasonal / Seasonal student prediction failed in the log
Solution: The seasonal model needs numpy, which a bare git pull won't
install. Run pip install -r requirements.txt. Confirm the model loads:
python -c "import numpy, json; print('vocab', len(json.load(open('seasonal_model.json'))['idf']))"If the model is missing or numpy isn't installed, scoring falls back to neutral (no seasonal bias) rather than crashing.
For troubleshooting, use debug mode:
python main.py -dThis will:
- Prompt you to select a single website
- Send emails only to the sender (not BCC list)
- Use longer timeouts for requests
- Skip saving updated recipe files
Two log files, both self-capping so they can't grow without bound (all *.log
files are gitignored):
recipe_emailer.logβ structured logger output (timestamps + levels), written on every run whether launched by hand or by cron. Rotated per run, retaining the last 8 runs (the same window the site-health email reports), so it's easy to inspect "what did the last few runs do."cronjob.logβ full stdout capture from cron viacook.sh(the logger lines plus scraping-progress prints), trimmed to the last ~2000 lines.
Console output (stdout) mirrors the logger at INFO and above.
DEBUG: Detailed diagnostic informationINFO: General informational messagesWARNING: Warning messages (recoverable issues)ERROR: Error messages (serious issues)
2026-06-01 18:08:10 - __main__ - INFO - Recipe Emailer started at 2026-06-01 18:08:10
2026-06-01 18:08:10 - __main__ - INFO - Loading existing recipe data
2026-06-01 18:08:10 - __main__ - INFO - Seasonal tagging: tagged 0 new recipe(s)
2026-06-01 18:08:10 - recipe_selector - INFO - Selected 3 recipes: 1 seafood, 2 landfood
2026-06-01 18:08:10 - __main__ - INFO - selected https://.../tzatziki-chicken-salad/ [single_main]: season_fit=0.873 final_score=0.676
2026-06-01 18:08:11 - __main__ - INFO - Sending email
2026-06-01 18:08:12 - __main__ - INFO - β Process completed successfully in 2.63s
The selected ... season_fit=... final_score=... lines record why each meal was
picked β useful since chosen recipes are removed from the pool after a run.
MIT License - see LICENSE file
wassupluke
- hhursev/recipe-scrapers - Recipe parsing library
- theskumar/python-dotenv - Environment management
- tqdm/tqdm - Progress bars
- Seasonal AI selection: distilled pure-numpy model biases picks toward in-season recipes (no runtime LLM)
- Site-health monitoring: alerts the maintainer when a scraper's regex breaks
- Streaming scrape: fixes Raspberry Pi out-of-memory on large scrapes
- Encoding fix: scraped pages decode by declared/sniffed charset (no more UTF-8 mojibake)
- Complete codebase refactor
- 100% type coverage
- Comprehensive testing
- Production logging
- Zero technical debt
- Original functional version
Made with β€οΈ for easier meal planning