Skip to content

wikimediabrasil/diffapi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diff RSS API

A Django-based API and crawler for the multilingual Diff blog, which is hosted on WordPress.com and uses the Polylang plugin for translations.

Rationale

Diff uses Polylang for translations, which breaks the WordPress.com API and leaves no other official API. This project works around that by crawling all language alternates via RSS, extracting post metadata (image, title, id, categories, description, pub date), and providing a stable, language-aware JSON API for consumers.

Features

  • Crawls all language alternates from the main page's <head>.
  • Paginates through /feed/?paged=N for each language.
  • Extracts post metadata, including the first inline image.
  • Stores posts and categories in a Django database.
  • Management command for periodic ingestion, with options to limit, filter by language, and update existing posts.
  • JSON API to fetch posts by category, with optional language filtering.

Setup

  1. Install dependencies:
    pip install -r requirements.txt
  2. Run migrations:
    python manage.py makemigrations
    python manage.py migrate
  3. Ingest posts:
    python manage.py crawl_diff_feeds
    # or limit/language/update
    python manage.py crawl_diff_feeds --languages en,pt --limit 50 --update-existing
  4. Run the server:
    python manage.py runserver

API Usage

  • Get all posts for a category:
    GET /tags/<slug>/
    
  • Filter by language:
    GET /tags/<slug>/?languages=en,pt
    # or
    GET /tags/<slug>/?language=en&language=pt
    
  • Response:
    {
      "category": { "slug": "wikimedia", "name": "Wikimedia" },
      "filters": { "languages": ["en", "pt"] },
      "count": 2,
      "posts": [
        {
          "external_id": "...",
          "language": "en",
          "title": "...",
          "description": "...",
          "image_url": "https://...",
          "link": "https://...",
          "pub_date": "2025-09-11T12:34:56+00:00",
          "categories": [{ "slug": "wikimedia", "name": "Wikimedia" }]
        }
      ]
    }

Management Command Options

  • --languages en,pt — Only ingest these languages
  • --limit 50 — Limit number of posts per language
  • --update-existing — Update existing posts if metadata changes

Notes

  • Images are always taken from the first inline image in the post content.
  • Data is refreshed by running the management command.
  • The API is read-only and public.

License

See LICENSE file.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published