Goodreads User Scraper

Export Goodreads profile, shelves, books, and authors to JSON

Usage

Use pipx or uv — both install the CLI from PyPI.

Install once, then run

Best for repeat use. Installs the CLI into an isolated environment and adds the goodreads-user-scraper command to your shell.

pipx install goodreads-user-scraper      # or: uv tool install goodreads-user-scraper
goodreads-user-scraper --user_id <your id>

Run once without installing

Best for one-off use. Downloads and runs the CLI in a temporary environment: no install step, no $PATH changes.

pipx run goodreads-user-scraper --user_id <your id>
# or: uvx goodreads-user-scraper --user_id <your id>

Output

Data is written to --output_dir (default goodreads-data/):

goodreads-data/
├── user.json                          # profile: name, average rating, rating/review counts
└── books/
    ├── 4395.The_Grapes_of_Wrath.json  # one JSON file per book
    └── …

Each books/*.json looks like this — your rating, dates_read, shelves, and exclusive_shelf (the one status shelf, e.g. read, or null) come from your library; the author is nested:

{
  "book_id_title": "4395.The_Grapes_of_Wrath",
  "book_id": "4395",
  "book_title": "The Grapes of Wrath",
  "book_description": "The Grapes of Wrath is a landmark of American literature. A portrait of the conflict between the powerful and the powerless…",
  "book_url": "https://www.goodreads.com/book/show/4395.The_Grapes_of_Wrath",
  "book_image": "https://m.media-amazon.com/images/S/compressed.photo.goodreads.com/books/1511302892i/4395.jpg",
  "book_series_uri": null,
  "year_first_published": "1939",
  "num_pages": 455,
  "genres": ["Classics", "Fiction", "Historical Fiction", "Literature", "Novels", "School", "Historical"],
  "num_ratings": 1011464,
  "num_reviews": 31088,
  "average_rating": 4.03,
  "author": {
    "author_id_title": "585.John_Steinbeck",
    "author_id": "585",
    "author_name": "John Steinbeck",
    "author_url": "https://www.goodreads.com/author/show/585.John_Steinbeck",
    "author_image": "https://images.gr-assets.com/authors/1182118389p5/585.jpg",
    "author_description": "John Ernst Steinbeck was an American writer. He won the 1962 Nobel Prize in Literature…"
  },
  "rating": 5,
  "dates_read": ["May 03, 2020"],
  "shelves": ["read", "2020", "2020s-favorites"],
  "exclusive_shelf": "read"
}

The two description fields are truncated here; the rest is real output. Without a cookie only user.json is written (see Authentication); --skip_authors omits the nested author.

What the CLI looks like in other states

Scenario	Demo
Nothing to do
No cookie
Invalid cookie

Arguments

Flag	Description	Default
`--user_id`	Required. The user whose data to scrape (find your user id).	—
`--output_dir`	Directory where scraped data is written.	`goodreads-data`
`--cookie`	Your Goodreads session cookie (the full `Cookie:` request-header value); required for shelf scraping — see Authentication.	None
`--cookie_file`	Path to a text file containing your session cookie.	None
`--skip_user_info`	Skip scraping user information.	—
`--skip_shelves`	Skip scraping shelves. Books (and their authors) are scraped from your shelves, so this skips them too.	—
`--skip_authors`	Skip scraping authors.	—

Authentication

Shelf scraping requires a cookie — Goodreads hides shelf data behind login. Without one you get the profile only; with one you also get shelves, books, and authors.

Getting your session cookie

Sign in to Goodreads in your browser.
Open DevTools (Cmd/Ctrl+Shift+I) and switch to the Network tab.
Refresh the page, then click any goodreads.com request in the list.
In the request Headers, find the Cookie: header and copy its full value.

Passing the cookie

In order of precedence (first one set wins):

--cookie "<cookie string>"
GOODREADS_COOKIE environment variable
--cookie_file <path-to-file>

Cookies typically last several weeks. If you see a "Cookie appears invalid or expired" error, re-grab the cookie from your browser.

If no cookie is provided, shelf scraping is skipped with a warning. Pass --skip_shelves to suppress the warning.

FAQ

Missing profile or shelf data?

Your own account: pass your session cookie (see Authentication) — your profile, shelves, and books all scrape, even on a private profile.
Another user's account: what you can scrape depends on their profile privacy setting. Shelves always require your cookie (see Authentication).
- Anyone: the profile scrapes even without a cookie.
- Goodreads members only: pass your cookie — any signed-in account works.
- Friends only: pass your cookie, and your account must be their friend.

Hit a rate-limit or timeout?

Transient errors (timeouts, 429, 5xx) are retried with exponential backoff. If a book still can't be fetched, the run finishes the rest, logs the skips, and exits with a non-zero status so you know the export is incomplete — re-run to fetch the missing books (already-saved books are skipped). A profile or shelf-listing failure stops the run early, since nothing else can proceed.

Can I export to a SQLite database (or another format)?

The scraper outputs JSON, which converts cleanly to other formats. For SQLite, sqlite-utils infers the schema and handles indexes and upserts. Combine the per-book files and load them into a table keyed on book_id:

goodreads-user-scraper --user_id <id> --cookie "<cookie>"
jq -s . goodreads-data/books/*.json | sqlite-utils upsert books.db books - --pk book_id
sqlite-utils create-index --if-not-exists books.db books book_title

Re-running the scraper fetches only new books and upsert updates the table in place, so the pipeline is safe to rerun on a schedule. The nested author and shelves come through as JSON columns — query them with SQLite's JSON functions (json_extract, json_each). See #38 for context.

Contributing

Contributions are welcome! See CONTRIBUTING.md for local development setup. To report a bug or request a feature, open an issue; for usage questions, start a thread in Discussions.

Licensed under MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.conductor		.conductor
.github		.github
.vscode		.vscode
assets		assets
scraper		scraper
scripts		scripts
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Goodreads User Scraper

Contents

Usage

Install once, then run

Run once without installing

Output

Arguments

Authentication

Getting your session cookie

Passing the cookie

FAQ

Contributing

About

Uh oh!

Releases 25

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Goodreads User Scraper

Contents

Usage

Install once, then run

Run once without installing

Output

Arguments

Authentication

Getting your session cookie

Passing the cookie

FAQ

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 25

Uh oh!

Contributors

Uh oh!

Languages