Gutenberg Offline

This scraper downloads the whole Project Gutenberg library and packages it into a ZIM file, a clean and user-friendly format for storing content for offline usage.

The ZIM file includes a modern, responsive Vue.js interface with features like:

Browse books by title, author, or Library of Congress Classification (LCC) shelves
Advanced filtering by language, format, and more
Full-text search across all content
Multilingual support with automatic language detection
Responsive design that works on desktop and mobile devices
No-JavaScript fallback for maximum compatibility

Getting Started

The recommended way to use the Gutenberg scraper is with Docker, which includes all dependencies pre-installed.

Using Docker (Recommended)

Run the scraper:

docker run -v $(pwd)/output:/output ghcr.io/openzim/gutenberg gutenberg2zim

The -v $(pwd)/output:/output option mounts your local output folder to save the ZIM file.

Note: On Windows PowerShell, replace $(pwd) with ${PWD}. Alternatively, use the full path: -v C:\Users\YourName\output:/output

View available options:

docker run ghcr.io/openzim/gutenberg gutenberg2zim --help

Example with custom options:

docker run -v $(pwd)/output:/output ghcr.io/openzim/gutenberg \
  gutenberg2zim -l en,fr -f pdf --books 100-200 --lcc-shelves all

This downloads English and French books with IDs 100-200 in PDF format, including LCC shelf pages.

Using PyPI

Alternatively, install from PyPI:

pip install gutenberg2zim
gutenberg2zim --help

Note: You'll need to install system dependencies (zim-tools) separately. See CONTRIBUTING.md for details.

Command-Line Options

-h --help                            Display help message
-F --force                           Overwrite existing ZIM file

-l --languages=<list>                Comma-separated language codes (ISO 639-1 or ISO 639-3)
-f --formats=<list>                  Comma-separated formats (epub, html, pdf, all)

-z --zim-file=<file>                 ZIM file output path
--zim-name=<name>                    ZIM name (metadata)
-t --zim-title=<title>               ZIM title
-n --zim-desc=<description>          ZIM description
-L --zim-long-desc=<description>     ZIM long description
--zim-languages=<languages>          ZIM language metadata

-b --books=<ids>                     Specific book IDs (comma-separated or ranges with dashes)
-c --concurrency=<nb>                Number of concurrent workers (default: 16)

--no-index                           Skip full-text index creation
--lcc-shelves=<shelves>              LCC shelf codes (comma-separated or 'all')
--primary-color=<color>              Primary UI color (hex format, e.g., #1976D2)
--secondary-color=<color>            Secondary UI color (hex format, e.g., #424242)

--publisher=<publisher>              Custom publisher name (default: openZIM)
--mirror-url=<url>                   Custom Gutenberg mirror URL
--output=<folder>                    Output folder (default: ./output)
--debug                              Enable verbose output

Features

User Interface

Modern Web Interface: Fast, responsive single-page application with smooth navigation
Multiple View Modes: Switch between grid and list views for books
Responsive Design: Optimized for desktop, tablet, and mobile devices
Dark/Light Theme: Automatic theme switching based on system preferences
Customizable Colors: Configure primary and secondary brand colors

Content Organization

Browse by Books: View all books with cover images, titles, and authors
Browse by Authors: Explore authors with their complete bibliographies
LCC Shelves: Browse books by Library of Congress Classification categories
Smart Pagination: Efficient navigation through large collections

Search & Discovery

Full-Text Search: Search across all books, authors, and shelves
Quick Filters: Find authors by name or shelves by code
Rich Search Results: Search results include descriptions and metadata

Filtering & Sorting

Language Filter: Filter books by language (supports all Gutenberg languages)
Format Filter: Filter by available formats (EPUB, HTML, PDF, TXT, MOBI)
Sort Options: Sort by popularity (download count) or title
Sort Order: Toggle between ascending and descending order

Book Details

Comprehensive Metadata: Title, subtitle, author, description, languages, license
Author Information: Author name with birth/death years and lifespan
Popularity Rating: Star rating based on download statistics
Download Counts: Formatted download statistics
LCC Classification: Link to Library of Congress Classification shelf
Multiple Formats: Download books in available formats (EPUB, HTML, PDF, etc.)
Cover Images: High-quality book cover images where available

Internationalization

Multiple Languages: Full UI translations for many languages
Automatic Detection: Detects browser language and sets UI accordingly
Language Switcher: Easy language selection from header menu
RTL Support: Right-to-left layout support for Arabic, Hebrew, etc.

Accessibility

No-JavaScript Fallback: Complete HTML-only version for browsers without JavaScript
Semantic HTML: Proper heading hierarchy and ARIA labels
Keyboard Navigation: Full keyboard accessibility
Screen Reader Support: ARIA labels and descriptions throughout
High Contrast: Readable text with proper color contrast ratios

Technical Features

ZIM Format: Compressed, indexed format for offline usage
Full-Text Indexing: Optional full-text search index within ZIM
Concurrent Processing: Multi-threaded book processing for faster scraping
Custom Mirrors: Support for custom Gutenberg mirror URLs
Docker Support: Pre-built Docker images with all dependencies

Contributing

We welcome contributions! Whether you want to:

Add or improve UI translations
Fix bugs or add features
Improve documentation
Develop the Vue.js interface

Please see CONTRIBUTING.md for detailed guidelines on setting up the development environment, code style, testing, and the pull request process.

Main coding guidelines follow the openZIM Wiki.

Screenshots

License

GPLv3 or later, see LICENSE for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 1,270 Commits
.github		.github
locales		locales
pictures		pictures
scraper		scraper
ui		ui
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
offliner-definition.json		offliner-definition.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Gutenberg Offline

Getting Started

Using Docker (Recommended)

Using PyPI

Command-Line Options

Features

User Interface

Content Organization

Search & Discovery

Filtering & Sorting

Book Details

Internationalization

Accessibility

Technical Features

Contributing

Screenshots

License

About

Uh oh!

Releases 6

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Gutenberg Offline

Getting Started

Using Docker (Recommended)

Using PyPI

Command-Line Options

Features

User Interface

Content Organization

Search & Discovery

Filtering & Sorting

Book Details

Internationalization

Accessibility

Technical Features

Contributing

Screenshots

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages