A Django-based API and crawler for the multilingual Diff blog, which is hosted on WordPress.com and uses the Polylang plugin for translations.
Diff uses Polylang for translations, which breaks the WordPress.com API and leaves no other official API. This project works around that by crawling all language alternates via RSS, extracting post metadata (image, title, id, categories, description, pub date), and providing a stable, language-aware JSON API for consumers.
- Crawls all language alternates from the main page's
<head>. - Paginates through
/feed/?paged=Nfor each language. - Extracts post metadata, including the first inline image.
- Stores posts and categories in a Django database.
- Management command for periodic ingestion, with options to limit, filter by language, and update existing posts.
- JSON API to fetch posts by category, with optional language filtering.
- Install dependencies:
pip install -r requirements.txt
- Run migrations:
python manage.py makemigrations python manage.py migrate
- Ingest posts:
python manage.py crawl_diff_feeds # or limit/language/update python manage.py crawl_diff_feeds --languages en,pt --limit 50 --update-existing - Run the server:
python manage.py runserver
- Get all posts for a category:
GET /tags/<slug>/ - Filter by language:
GET /tags/<slug>/?languages=en,pt # or GET /tags/<slug>/?language=en&language=pt - Response:
{ "category": { "slug": "wikimedia", "name": "Wikimedia" }, "filters": { "languages": ["en", "pt"] }, "count": 2, "posts": [ { "external_id": "...", "language": "en", "title": "...", "description": "...", "image_url": "https://...", "link": "https://...", "pub_date": "2025-09-11T12:34:56+00:00", "categories": [{ "slug": "wikimedia", "name": "Wikimedia" }] } ] }
--languages en,pt— Only ingest these languages--limit 50— Limit number of posts per language--update-existing— Update existing posts if metadata changes
- Images are always taken from the first inline image in the post content.
- Data is refreshed by running the management command.
- The API is read-only and public.
See LICENSE file.