Skip to content

YashTotale/goodreads-user-scraper

Repository files navigation

Goodreads Icon

Goodreads User Scraper

Export Goodreads profile, shelves, books, and authors to JSON

Version  Downloads 

CLI Demo

Contents

Usage

Use pipx or uv — both install the CLI from PyPI.

Install once, then run

Best for repeat use. Installs the CLI into an isolated environment and adds the goodreads-user-scraper command to your shell.

pipx install goodreads-user-scraper      # or: uv tool install goodreads-user-scraper
goodreads-user-scraper --user_id <your id>

Run once without installing

Best for one-off use. Downloads and runs the CLI in a temporary environment: no install step, no $PATH changes.

pipx run goodreads-user-scraper --user_id <your id>
# or: uvx goodreads-user-scraper --user_id <your id>

Output

Data is written to --output_dir (default goodreads-data/):

goodreads-data/
├── user.json                          # profile: name, average rating, rating/review counts
└── books/
    ├── 4395.The_Grapes_of_Wrath.json  # one JSON file per book
    └── …

Each books/*.json looks like this — your rating, dates_read, shelves, and exclusive_shelf (the one status shelf, e.g. read, or null) come from your library; the author is nested:

{
  "book_id_title": "4395.The_Grapes_of_Wrath",
  "book_id": "4395",
  "book_title": "The Grapes of Wrath",
  "book_description": "The Grapes of Wrath is a landmark of American literature. A portrait of the conflict between the powerful and the powerless…",
  "book_url": "https://www.goodreads.com/book/show/4395.The_Grapes_of_Wrath",
  "book_image": "https://m.media-amazon.com/images/S/compressed.photo.goodreads.com/books/1511302892i/4395.jpg",
  "book_series_uri": null,
  "year_first_published": "1939",
  "num_pages": 455,
  "genres": ["Classics", "Fiction", "Historical Fiction", "Literature", "Novels", "School", "Historical"],
  "num_ratings": 1011464,
  "num_reviews": 31088,
  "average_rating": 4.03,
  "author": {
    "author_id_title": "585.John_Steinbeck",
    "author_id": "585",
    "author_name": "John Steinbeck",
    "author_url": "https://www.goodreads.com/author/show/585.John_Steinbeck",
    "author_image": "https://images.gr-assets.com/authors/1182118389p5/585.jpg",
    "author_description": "John Ernst Steinbeck was an American writer. He won the 1962 Nobel Prize in Literature…"
  },
  "rating": 5,
  "dates_read": ["May 03, 2020"],
  "shelves": ["read", "2020", "2020s-favorites"],
  "exclusive_shelf": "read"
}

The two description fields are truncated here; the rest is real output. Without a cookie only user.json is written (see Authentication); --skip_authors omits the nested author.

What the CLI looks like in other states
Scenario Demo
Nothing to do Nothing-to-do demo
No cookie No-cookie demo
Invalid cookie Invalid-cookie demo

Arguments

Flag Description Default
--user_id Required. The user whose data to scrape (find your user id).
--output_dir Directory where scraped data is written. goodreads-data
--cookie Your Goodreads session cookie (the full Cookie: request-header value); required for shelf scraping — see Authentication. None
--cookie_file Path to a text file containing your session cookie. None
--skip_user_info Skip scraping user information.
--skip_shelves Skip scraping shelves. Books (and their authors) are scraped from your shelves, so this skips them too.
--skip_authors Skip scraping authors.

Authentication

Shelf scraping requires a cookie — Goodreads hides shelf data behind login. Without one you get the profile only; with one you also get shelves, books, and authors.

Getting your session cookie

  1. Sign in to Goodreads in your browser.
  2. Open DevTools (Cmd/Ctrl+Shift+I) and switch to the Network tab.
  3. Refresh the page, then click any goodreads.com request in the list.
  4. In the request Headers, find the Cookie: header and copy its full value.

Passing the cookie

In order of precedence (first one set wins):

  1. --cookie "<cookie string>"
  2. GOODREADS_COOKIE environment variable
  3. --cookie_file <path-to-file>

Cookies typically last several weeks. If you see a "Cookie appears invalid or expired" error, re-grab the cookie from your browser.

If no cookie is provided, shelf scraping is skipped with a warning. Pass --skip_shelves to suppress the warning.

FAQ

Missing profile or shelf data?
  • Your own account: pass your session cookie (see Authentication) — your profile, shelves, and books all scrape, even on a private profile.
  • Another user's account: what you can scrape depends on their profile privacy setting. Shelves always require your cookie (see Authentication).
    • Anyone: the profile scrapes even without a cookie.
    • Goodreads members only: pass your cookie — any signed-in account works.
    • Friends only: pass your cookie, and your account must be their friend.
Hit a rate-limit or timeout?

Transient errors (timeouts, 429, 5xx) are retried with exponential backoff. If a book still can't be fetched, the run finishes the rest, logs the skips, and exits with a non-zero status so you know the export is incomplete — re-run to fetch the missing books (already-saved books are skipped). A profile or shelf-listing failure stops the run early, since nothing else can proceed.

Can I export to a SQLite database (or another format)?

The scraper outputs JSON, which converts cleanly to other formats. For SQLite, sqlite-utils infers the schema and handles indexes and upserts. Combine the per-book files and load them into a table keyed on book_id:

goodreads-user-scraper --user_id <id> --cookie "<cookie>"
jq -s . goodreads-data/books/*.json | sqlite-utils upsert books.db books - --pk book_id
sqlite-utils create-index --if-not-exists books.db books book_title

Re-running the scraper fetches only new books and upsert updates the table in place, so the pipeline is safe to rerun on a schedule. The nested author and shelves come through as JSON columns — query them with SQLite's JSON functions (json_extract, json_each). See #38 for context.

Contributing

Contributions are welcome! See CONTRIBUTING.md for local development setup. To report a bug or request a feature, open an issue; for usage questions, start a thread in Discussions.

Licensed under MIT.

About

Export Goodreads profile, shelves, books, and authors to JSON

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors