Forage

CLI tool to scrape posts, comments, and reactions from private Facebook groups using browser automation.

Installation

From PyPI

# Install with pip
pip install ForageFacebook

# Or with uv
uv pip install ForageFacebook

# Install Playwright browsers
playwright install chromium

From Source

# Clone and install with uv
git clone https://github.com/jwmoss/forage.git
cd forage
uv sync

# Install Playwright browsers
uv run playwright install chromium

Quick Start

# Step 1: Log into Facebook (opens browser window)
uv run forage login

# Step 2: Scrape a group (last 7 days by default)
uv run forage scrape https://www.facebook.com/groups/your-group-id -o data.json

Usage

Login

# Default: opens Chromium browser
uv run forage login

# Use Firefox instead
uv run forage login --browser firefox

Scrape Posts

# Basic scrape (last 7 days, with comments)
uv run forage scrape your-group-slug

# Last 14 days, save to file
uv run forage scrape your-group-slug --days 14 -o posts.json

# Specific date range
uv run forage scrape your-group-slug --since 2024-01-01 --until 2024-01-15

# Skip comments (faster)
uv run forage scrape your-group-slug --skip-comments

# Only popular comments (5+ reactions)
uv run forage scrape your-group-slug --min-reactions 5

# Top 10 comments per post
uv run forage scrape your-group-slug --top-comments 10

# Watch the browser (debugging)
uv run forage scrape your-group-slug --no-headless -v

# Slower scraping to avoid rate limits
uv run forage scrape your-group-slug --delay 5.0

# Read group from stdin (for scripting)
echo "your-group-slug" | uv run forage scrape -

CLI Reference

forage [global flags] <command> [args]

Global Flags:
  -v, --verbose   Show progress and debug info
  -q, --quiet     Suppress non-error output
  --no-color      Disable colored output
  --version       Show version
  --help          Show help

Commands:
  login           Open browser for interactive Facebook login
  scrape          Scrape posts from a Facebook group

scrape flags

Flag	Default	Description
`--days`	`7`	Posts from last N days
`--since`	-	Start date (ISO 8601: YYYY-MM-DD)
`--until`	-	End date (ISO 8601: YYYY-MM-DD)
`--limit`	`0`	Max posts (0 = unlimited)
`--delay`	`2.0`	Seconds between page loads
`--min-reactions`	`0`	Min reactions for comments
`--top-comments`	`0`	Top N comments per post
`--skip-comments`	`false`	Skip comment fetching
`--skip-reactions`	`false`	Skip reaction counts
`-o, --output`	`-`	Output file (default: stdout)
`-f, --format`	`json`	Output format: json, sqlite, csv
`--no-headless`	`false`	Show browser window
`--browser`	`chromium`	Browser: chromium, firefox, webkit

SQLite Export

Export directly to SQLite for easier analysis:

# Export to SQLite database
uv run forage scrape your-group-slug -f sqlite -o data.db

# Query with sqlite3
sqlite3 data.db "SELECT content, reactions_total FROM posts ORDER BY reactions_total DESC LIMIT 10"

# Join posts with comments
sqlite3 data.db "SELECT p.content, c.content FROM posts p JOIN comments c ON c.post_id = p.id"

CSV Export

Export to CSV for spreadsheet analysis:

# Export to CSV (creates posts.csv and posts.comments.csv)
uv run forage scrape your-group-slug -f csv -o posts.csv

# Open in Excel/Numbers/Sheets or analyze with csvkit
csvstat posts.csv
csvcut -c author_name,content,reactions_total posts.csv | head -20

Output Format

{
  "group": {
    "id": "123456",
    "name": "My Group",
    "url": "https://www.facebook.com/groups/123456"
  },
  "scraped_at": "2024-01-20T15:30:00Z",
  "date_range": {
    "since": "2024-01-13",
    "until": "2024-01-20"
  },
  "posts": [
    {
      "id": "pfbid...",
      "author": {
        "name": "Jane Doe",
        "profile_url": "https://facebook.com/jane.doe"
      },
      "content": "Post text here...",
      "timestamp": "2024-01-19T12:00:00Z",
      "reactions": {
        "total": 42,
        "like": 0,
        "love": 0,
        "haha": 0,
        "wow": 0,
        "sad": 0,
        "angry": 0
      },
      "comments_count": 15,
      "comments": [
        {
          "id": "comment_...",
          "author": {"name": "John Smith", "profile_url": "..."},
          "content": "Comment text...",
          "timestamp": null,
          "reactions": {"total": 5},
          "replies": []
        }
      ]
    }
  ]
}

Data Analysis Examples

# Top 10 posts by reactions
uv run forage scrape mygroup --skip-comments | \
  jq '.posts | sort_by(.reactions.total) | reverse | .[0:10]'

# All post content
uv run forage scrape mygroup --skip-comments | \
  jq '.posts[].content'

# Posts with 50+ reactions
uv run forage scrape mygroup | \
  jq '.posts | map(select(.reactions.total >= 50))'

# Count posts per author
uv run forage scrape mygroup | \
  jq '.posts | group_by(.author.name) | map({author: .[0].author.name, count: length}) | sort_by(.count) | reverse'

Development

# Install dev dependencies
uv sync --extra dev

# Run type checker
uv run ty check src/

# Run tests
uv run pytest

Architecture

src/forage/
├── cli.py       # Click CLI commands
├── auth.py      # Session management (login, cookies)
├── scraper.py   # Core scraping logic
├── parser.py    # HTML parsing for posts/comments
└── models.py    # Pydantic data models

Limitations

Requires manual login (no automated auth)
Facebook's HTML structure changes frequently
Rate limiting may require slower scraping
Individual reaction types not broken out (only total)
Session cookies expire after ~30 days

Roadmap

Planned features and improvements:

High Priority

Cookie import - Import cookies from browser extensions (EditThisCookie, Netscape format)
Incremental scraping - Only fetch posts newer than last scrape
Progress persistence - Resume interrupted scrapes

Medium Priority

Multiple groups - Scrape multiple groups in one command
Media extraction - Download images/videos from posts
Reaction breakdown - Extract individual reaction types (like, love, etc.)
Author statistics - Aggregate stats per author
Scheduled scraping - Cron-friendly mode with locking

Nice to Have

Web UI - Local web interface for browsing scraped data
Webhook notifications - Notify on new posts matching criteria
Public group support - Scrape without login for public groups
Parallel scraping - Speed up multi-group scrapes

Contributing

See CONTRIBUTING.md for development setup and guidelines.

Security

See SECURITY.md for security considerations and best practices.

Support

If you find this tool useful, consider sponsoring development:

License

MPL-2.0 (Mozilla Public License 2.0)

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github		.github
scripts		scripts
src/forage		src/forage
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Forage

Table of Contents

Installation

From PyPI

From Source

Quick Start

Usage

Login

Scrape Posts

CLI Reference

scrape flags

SQLite Export

CSV Export

Output Format

Data Analysis Examples

Development

Architecture

Limitations

Roadmap

High Priority

Medium Priority

Nice to Have

Contributing

Security

Support

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 10

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages