Export Goodreads profile, shelves, books, and authors to JSON
Use pipx or uv — both install the CLI from PyPI.
Best for repeat use. Installs the CLI into an isolated environment and adds the goodreads-user-scraper command to your shell.
pipx install goodreads-user-scraper # or: uv tool install goodreads-user-scraper
goodreads-user-scraper --user_id <your id>Best for one-off use. Downloads and runs the CLI in a temporary environment: no install step, no $PATH changes.
pipx run goodreads-user-scraper --user_id <your id>
# or: uvx goodreads-user-scraper --user_id <your id>Data is written to --output_dir (default goodreads-data/):
goodreads-data/
├── user.json # profile: name, average rating, rating/review counts
└── books/
├── 4395.The_Grapes_of_Wrath.json # one JSON file per book
└── …
Each books/*.json looks like this — your rating, dates_read, shelves, and exclusive_shelf (the one status shelf, e.g. read, or null) come from your library; the author is nested:
{
"book_id_title": "4395.The_Grapes_of_Wrath",
"book_id": "4395",
"book_title": "The Grapes of Wrath",
"book_description": "The Grapes of Wrath is a landmark of American literature. A portrait of the conflict between the powerful and the powerless…",
"book_url": "https://www.goodreads.com/book/show/4395.The_Grapes_of_Wrath",
"book_image": "https://m.media-amazon.com/images/S/compressed.photo.goodreads.com/books/1511302892i/4395.jpg",
"book_series_uri": null,
"year_first_published": "1939",
"num_pages": 455,
"genres": ["Classics", "Fiction", "Historical Fiction", "Literature", "Novels", "School", "Historical"],
"num_ratings": 1011464,
"num_reviews": 31088,
"average_rating": 4.03,
"author": {
"author_id_title": "585.John_Steinbeck",
"author_id": "585",
"author_name": "John Steinbeck",
"author_url": "https://www.goodreads.com/author/show/585.John_Steinbeck",
"author_image": "https://images.gr-assets.com/authors/1182118389p5/585.jpg",
"author_description": "John Ernst Steinbeck was an American writer. He won the 1962 Nobel Prize in Literature…"
},
"rating": 5,
"dates_read": ["May 03, 2020"],
"shelves": ["read", "2020", "2020s-favorites"],
"exclusive_shelf": "read"
}The two description fields are truncated here; the rest is real output. Without a cookie only user.json is written (see Authentication); --skip_authors omits the nested author.
| Flag | Description | Default |
|---|---|---|
--user_id |
Required. The user whose data to scrape (find your user id). | — |
--output_dir |
Directory where scraped data is written. | goodreads-data |
--cookie |
Your Goodreads session cookie (the full Cookie: request-header value); required for shelf scraping — see Authentication. |
None |
--cookie_file |
Path to a text file containing your session cookie. | None |
--skip_user_info |
Skip scraping user information. | — |
--skip_shelves |
Skip scraping shelves. Books (and their authors) are scraped from your shelves, so this skips them too. | — |
--skip_authors |
Skip scraping authors. | — |
Shelf scraping requires a cookie — Goodreads hides shelf data behind login. Without one you get the profile only; with one you also get shelves, books, and authors.
- Sign in to Goodreads in your browser.
- Open DevTools (Cmd/Ctrl+Shift+I) and switch to the Network tab.
- Refresh the page, then click any
goodreads.comrequest in the list. - In the request Headers, find the
Cookie:header and copy its full value.
In order of precedence (first one set wins):
--cookie "<cookie string>"GOODREADS_COOKIEenvironment variable--cookie_file <path-to-file>
Cookies typically last several weeks. If you see a "Cookie appears invalid or expired" error, re-grab the cookie from your browser.
If no cookie is provided, shelf scraping is skipped with a warning. Pass --skip_shelves to suppress the warning.
Missing profile or shelf data?
- Your own account: pass your session cookie (see Authentication) — your profile, shelves, and books all scrape, even on a private profile.
- Another user's account: what you can scrape depends on their profile privacy setting. Shelves always require your cookie (see Authentication).
- Anyone: the profile scrapes even without a cookie.
- Goodreads members only: pass your cookie — any signed-in account works.
- Friends only: pass your cookie, and your account must be their friend.
Hit a rate-limit or timeout?
Transient errors (timeouts, 429, 5xx) are retried with exponential backoff. If a book still can't be fetched, the run finishes the rest, logs the skips, and exits with a non-zero status so you know the export is incomplete — re-run to fetch the missing books (already-saved books are skipped). A profile or shelf-listing failure stops the run early, since nothing else can proceed.
Can I export to a SQLite database (or another format)?
The scraper outputs JSON, which converts cleanly to other formats. For SQLite, sqlite-utils infers the schema and handles indexes and upserts. Combine the per-book files and load them into a table keyed on book_id:
goodreads-user-scraper --user_id <id> --cookie "<cookie>"
jq -s . goodreads-data/books/*.json | sqlite-utils upsert books.db books - --pk book_id
sqlite-utils create-index --if-not-exists books.db books book_titleRe-running the scraper fetches only new books and upsert updates the table in place, so the pipeline is safe to rerun on a schedule. The nested author and shelves come through as JSON columns — query them with SQLite's JSON functions (json_extract, json_each). See #38 for context.
Contributions are welcome! See CONTRIBUTING.md for local development setup. To report a bug or request a feature, open an issue; for usage questions, start a thread in Discussions.
Licensed under MIT.



