Parse PDF bank statements into structured CSV, JSON, and Excel — locally, with no cloud services.
Lead Maintainer: J Long
| Package | PyPI | Description |
|---|---|---|
packages/parser-core/ |
bankstatements-core |
Shared parsing library — PDF extraction, services, templates |
packages/parser-free/ |
bankstatements-free |
Free-tier CLI — thin wrapper around bankstatements-core |
pip install bankstatements-free
# Place PDF statements in an input directory
bankstatements --input ./input --output ./outputOutput formats:
bankstatements --input ./input --output ./output --output-formats csv,json,excel- PDF extraction with configurable table boundaries
- CSV, JSON, and Excel export
- Batch processing with recursive directory scanning
- SHA-256 duplicate detection
- Transaction type classification (purchase, payment, refund, fee, transfer)
- Monthly transaction summaries and expense analysis
- IBAN extraction and grouping
- Multi-document type support (bank statements, credit cards, loans)
- GDPR-compliant local processing — no data leaves your machine
- Template-based statement detection (AIB Ireland, Revolut, and more)
Premium features (available in the private bankstatements-premium distribution):
- Credit card and loan statement support (no IBAN required)
- Process templates without IBAN patterns
| Bank | Template |
|---|---|
| AIB Ireland | aib_ireland.json |
| Revolut | revolut.json |
| Generic fallback | default.json |
Custom templates can be added — see docs/CUSTOM_TEMPLATES.md.
bankstatementprocessor/
├── packages/
│ ├── parser-core/ bankstatements-core (PyPI library)
│ │ ├── src/bankstatements_core/
│ │ └── tests/
│ └── parser-free/ bankstatements-free (free-tier CLI)
│ ├── src/bankstatements_free/
│ └── tests/
├── templates/ shared bank template JSON files
├── docs/ documentation
└── .github/workflows/
├── ci.yml lint + test both packages
├── boundary-check.yml enforce parser-free cannot import premium code
└── release-core.yml publish bankstatements-core to PyPI on core-v* tags
Free tier (this repo): a Dockerfile and docker-compose.yml are provided for local development. They build both bankstatements-core and bankstatements-free from source — no remote image is pulled and no license is required.
git clone https://github.com/longieirl/bankstatementprocessor.git
cd bankstatementprocessor
docker-compose up --buildPlace PDFs in ./input/ before running. Results are written to ./output/.
Premium tier: production Docker images are built and published from the private bankstatements-premium repository.
Setup:
# Install parser-core in editable mode
pip install -e packages/parser-core[dev,test]
# Install parser-free in editable mode (depends on parser-core)
pip install -e packages/parser-free[test]Common commands:
# Run tests (from packages/parser-core/)
pytest packages/parser-core/tests/ --cov=bankstatements_core --cov-fail-under=91
# Run tests (from packages/parser-free/)
pytest packages/parser-free/tests/
# Format + lint
black src tests
isort src tests
flake8 src testsCoverage requirement: 91% minimum on bankstatements-core.
Contributions are welcome!
- Fork the repository
- Create a branch:
git checkout -b feat/my-feature - Make changes and add tests (91%+ coverage required)
- Submit a PR — CI must pass (lint, tests, boundary check)
See CONTRIBUTING.md for full details.
Apache License 2.0 — free for commercial use with modifications allowed.