PDFMathTranslate-next is a scientific PDF translation system that preserves mathematical formulas, tables, charts, and document structure during translation. This page introduces the system's purpose, key features, and high-level architecture.
PDFMathTranslate-next translates academic and technical PDF documents while maintaining complex formatting. The system serves as the official reference implementation for BabelDOC, a specialized PDF translation backend engine.
Key Capabilities:
pdf2zh_next), WebUI (Gradio), Python API, Zotero plugin integrationDistribution: PyPI package (pdf2zh-next), Docker images (multi-arch), Windows standalone executable
License: AGPL v3 (provided "as is" with no warranties)
Sources: README.md37-38 README.md39-41 README.md45-49
| Feature Category | Implementation Details |
|---|---|
| Content Preservation | Formulas, charts, table of contents, annotations, page layout via BabelDOC backend |
| Translation Engines | 18+ services: OpenAI, DeepSeek, SiliconFlow (free tier), Google Translate, DeepL, Ollama (local LLMs) |
| Output Modes | Bilingual PDF (alternating pages), monolingual PDF, configurable page ordering |
| User Interfaces | CLI (pdf2zh_next command), WebUI (Gradio at localhost:7860), Python API (do_translate_async_stream), Zotero plugin |
| Deployment | PyPI/pip install, Docker (linux/amd64, linux/arm64), Windows standalone EXE (PyStand launcher) |
| Performance | SQLite translation cache, rate limiting (QPS/RPM), concurrent request pooling, subprocess isolation |
| Advanced Options | Glossary auto-extraction, partial page translation (--pages), custom system prompts, OCR workarounds |
| Internationalization | GUI: 11 languages (gui_translation.yaml), Documentation: Weblate-powered i18n |
Sources: README.md39-41 README.md71-81 pyproject.toml31-46
PDFMathTranslate is structured as a frontend wrapper around the BabelDOC backend, with a layered architecture separating user interfaces, translation orchestration, engine abstraction, and PDF processing.
Architecture Layers:
main.py), WebUI (gui.py), Python API (high_level.py), Docker, Zotero pluginmain.py (CLI argument parsing), gui.py (Gradio interface), high_level.py (translation orchestration)ConfigManager merges CLI args, environment variables, TOML config files into SettingsModelget_translator() factory instantiates BaseTranslator subclasses with shared TranslationCache and QPSRateLimiterSources: pdf2zh_next/main.py49-103 pdf2zh_next/gui.py pdf2zh_next/high_level.py pdf2zh_next/config/manager.py pdf2zh_next/translator/__init__.py
The translation process flows through configuration loading, translator instantiation, PDF processing, and result generation.
Execution Stages:
ConfigManager.initialize_config): Merge CLI args, environment variables (PDF2ZH_*), TOML files by priorityget_translator): Factory instantiates engine-specific BaseTranslator subclass with rate limiter and cache--no-dual/--no-mono flagsSources: pdf2zh_next/high_level.py pdf2zh_next/config/manager.py pdf2zh_next/translator/__init__.py pdf2zh_next/translator/base.py pdf2zh_next/translator/cache.py
The codebase is organized into the following top-level components:
| Directory/File | Purpose | Key Symbols |
|---|---|---|
pdf2zh_next/ | Main package directory | - |
├── main.py | CLI entrypoint | run_translate, build_args_parser |
├── gui.py | WebUI implementation | translate_files, build_ui_inputs, do_translate_async_stream |
├── high_level.py | Python API and translation orchestration | do_translate_async_stream, babeldoc_translate, create_babeldoc_config |
├── const.py | Constants and metadata | TRANSLATION_ENGINE_METADATA, TERM_EXTRACTION_ENGINE_METADATA |
├── config/ | Configuration system | - |
│ ├── manager.py | Configuration loading and merging | ConfigManager, merge_settings, parse_env_vars |
│ ├── model.py | Pydantic models | SettingsModel, BasicSettings, PDFSettings, TranslationSettings |
│ └── engine_settings.py | Translation engine configs | TranslateEngineSettings, OpenAISettings, DeepSeekSettings |
├── translator/ | Translation engine implementations | - |
│ ├── __init__.py | Factory functions | get_translator, get_term_translator |
│ ├── base.py | Abstract base classes | BaseTranslator, BaseRateLimiter |
│ ├── cache.py | SQLite caching | TranslationCache |
│ ├── ratelimiter.py | Rate limiting logic | calculate_qps_from_rpm |
│ ├── openai.py | OpenAI translator | OpenAITranslator |
│ ├── deepseek.py | DeepSeek translator (adapter pattern) | DeepSeekTranslator.transform |
│ └── *.py | 18+ other translators | Various concrete implementations |
├── i18n.py | GUI internationalization | gettext, LANGUAGES |
└── gui_translation.yaml | GUI translation strings | YAML key-value pairs for 11 languages |
.github/workflows/ | CI/CD pipelines | - |
├── python-test.yml | Test matrix (Python 3.10-3.13) | - |
├── build-and-publish.yml | PyPI/Docker/EXE builds | - |
├── rebuild.yml | Manual rebuild trigger | - |
└── docs.yml | MkDocs deployment | - |
pyproject.toml | Project metadata and dependencies | [project], [project.scripts], [tool.uv] |
Sources: README.md83 Diagram 3 (Provided context)
PDFMathTranslate supports 18+ translation services through a factory pattern with metadata-driven instantiation.
Engine Tiers:
| Tier | Examples | File Locations | Support Level |
|---|---|---|---|
| Tier 1 (Official) | SiliconFlow Free, OpenAI, DeepSeek, Qwen-MT | translator_impl/siliconflow_free.py, translator_impl/openai.py, translator_impl/deepseek.py | Active development, bug fixes |
| Tier 2 (Community) | Ollama, Gemini, Groq, Azure OpenAI | translator_impl/ollama.py, translator_impl/gemini.py, etc. | Community-maintained |
| Deprecated | Google Translate, Bing Translator | translator_impl/google.py, translator_impl/bing.py | No longer maintained |
Shared Infrastructure:
BaseTranslator: Abstract class defining do_translate(), do_llm_translate() methodsBaseRateLimiter: QPS/RPM enforcement with wait() methodTranslationCache: SQLite-backed cache keyed by (engine_type, parameters, text_hash)Adding New Engines: Create Pydantic settings class in config/translate_engine_model.py, add to TRANSLATION_ENGINE_SETTING_TYPE union, implement BaseTranslator subclass. See Adding Translation Engines.
Sources: pdf2zh_next/translator/__init__.py pdf2zh_next/translator/base.py pdf2zh_next/translator/cache.py pdf2zh_next/config/translate_engine_model.py pdf2zh_next/const.py
PDFMathTranslate uses a multi-source configuration system with strict priority ordering: CLI arguments override environment variables, which override TOML config files, which override Pydantic defaults.
| Priority | Source | Examples | Code Reference |
|---|---|---|---|
| 1 (Highest) | CLI arguments | --qps 5, --openai, --lang-out zh | main.py:main() → ConfigManager.initialize_config() |
| 2 | Environment variables | PDF2ZH_QPS=5, PDF2ZH_LANG_OUT=zh | config/manager.py:parse_env_vars() |
| 3 | User config file | --config-file custom.toml (explicit path) | config/manager.py:_read_toml_file() |
| 4 | Default config file | ~/.config/pdf2zh/config.v3.toml | const.py:DEFAULT_CONFIG_FILE |
| 5 (Lowest) | Pydantic defaults | SettingsModel.basic.gui = False | config/model.py:SettingsModel |
Configuration Processing:
ConfigManager.initialize_config() parses CLI args, environment variables, TOML filesmerge_settings() performs deep merge of nested dictionaries by priorityCLIEnvSettingsModel (CLI-friendly boolean flags) → to_settings_model() → SettingsModel (canonical)validate_settings() ensures engine selection, file paths, regex patterns, numeric bounds are validTOML Config Format:
See Configuration System for detailed documentation of all settings.
Sources: pdf2zh_next/config/manager.py pdf2zh_next/config/model.py pdf2zh_next/const.py8-16
PDFMathTranslate is the official reference implementation for BabelDOC, a PDF translation backend engine that handles layout analysis, formula detection, and bilingual PDF generation.
Integration Architecture:
high_level.py:create_babeldoc_config(): Converts SettingsModel → babeldoc.Configmain.py:main(): Calls babeldoc.assets.warmup() to download fonts and modelshigh_level.py:babeldoc_translate(): Wraps BabelDOC translation loop, yields progress eventsBabelDOC Responsibilities:
| Component | Purpose | Libraries Used |
|---|---|---|
| PDF Loading | Read PDF structure | PyMuPDF (pyproject.toml21) |
| Text Extraction | Extract text blocks | Pdfminer.six (BabelDOC dependency) |
| Layout Analysis | Detect formulas, tables, figures | DocLayout-YOLO (BabelDOC-Assets) |
| Font Management | Multilingual fonts (Noto Sans, etc.) | BabelDOC-Assets repository |
| PDF Reconstruction | Insert translations, generate output | PyMuPDF |
Asset Management:
BabelDOC requires offline assets (fonts, layout models). PDFMathTranslate calls babeldoc.assets.warmup() at startup (main.py88-89), which downloads assets to ~/.cache/babeldoc/ if missing. Windows EXE builds include pre-packaged assets to avoid download.
Output Modes:
--no-dual flag)See BabelDOC Integration for detailed asset management and Output Generation for PDF rendering options.
Sources: pdf2zh_next/high_level.py pdf2zh_next/main.py88-89 pyproject.toml36 README.md37-38 README.md109-122
PDFMathTranslate exposes multiple integration interfaces:
| Interface | Entry Point | Use Case | Documentation |
|---|---|---|---|
| CLI | pdf2zh_next command (installed by setuptools) | Terminal usage, scripting | Command-Line Interface |
| WebUI | pdf2zh_next --webui or gui.py:demo.launch() | Interactive browser-based usage | Web User Interface |
| Python API | from pdf2zh_next.high_level import do_translate_async_stream | Programmatic integration | Python API |
| HTTP API | (Currently not started in codebase) | Remote service calls | HTTP API |
| Zotero Plugin | Third-party plugin zotero-pdf2zh | PDF management integration | Zotero Plugin Integration |
The system is designed for extensibility:
BaseTranslator subclass, register in TRANSLATION_ENGINE_METADATA (see Adding Translation Engines)create_babeldoc_config to modify PDF processing behaviorgui.py and modify Gradio componentsSources: main.py gui.py high_level.py README.md80-81 README.md93
For specific subsystem details, see:
Refresh this wiki