Powered by Streamlit + AWS Textract. Specifically, Streamlit runs the user interaction and AWS Textract does the OCR.
In a business document context Textract is usually the preferred option over Rekognition. This example uses the Detect Document Text function to extract text from documents.
It also provides a simple folder and basename entry which provide the path in S3 to save the uploaded image.
Detected text can be downloaded as the raw list of lines (json file), or can be editted and then downloaded as a paragraph (txt file).
In a business use case this would either be a submission to save the entry in system of record or otherwise continue the processing pipeline.
- Requires AWS Account with access to Textract service and Read/Write access to an S3 bucket (AWS Tutorial)
- Copy or Rename
.env.exampleas.env.devand fill in AWS Access Key, Secret Key, Bucket Name, Region for your Rekognition account
mv .env.example .env.devRequires docker-compose to be installed (this comes with Docker Desktop).
docker-compose up
# Open localhost:8501 in a browserUse -d to detach from logs.
Use --build on subsequent runs to rebuild dependencies / docker image.
# Linting
docker-compose run streamlit-app nox.sh -s lint
# Unit Testing
docker-compose run streamlit-app nox.sh -s test
# Both
docker-compose run streamlit-app nox.sh
# As needed:
docker-compose build
# E2E Testing
docker-compose up -d --build
# Replace screenshots
docker-compose exec streamlit-app nox -s test -- -m e2e --visual-baseline
# Compare to visual baseline screenshots
docker-compose exec streamlit-app nox -s test -- -m e2e
# Turn off / tear down
docker-compose downFor code completion / linting / developing / etc.
python -m venv venv
. ./venv/bin/activate
# .\venv\Scripts\activate for Windows
python -m pip install -r ./streamlit_app/requirements.dev.txt
pre-commit install
# Linting / Static Checking / Unit Testing
python -m black streamlit_app
python -m isort --profile=black streamlit_app
python -m flake8 --config=./streamlit_app/.flake8 streamlit_app- Containerization with Docker
- Dependency installation with Pip
- Test automation with Nox
- Linting with pre-commit and Flake8
- Code formatting with Black
- Testing with pytest
- Code coverage with Coverage.py