Skip to content

NakliTechie/ScanLocal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ScanLocal — Browser-Native OCR

🔍 Extract text from images entirely in your browser — no upload, no API keys, no privacy concerns.

🎙️ Try it nowhttps://naklitechie.github.io/ScanLocal/


Features

  • Five OCR engines in one tool:

    • 📜 Default (Tesseract) — Most reliable browser path, broad language support
    • 🚀 Fast (PaddleOCR) — Printed text path via Paddle models
    • ✍️ Handwriting (TrOCR) — SOTA for handwritten text, ~500 MB
    • 🧠 Smart OCR — Transformer fallback for printed text, no boxes
    • 🔭 Qwen2-VL (Experimental) — Best-effort visual document reader, very large download
  • 100% client-side — Your images never leave your device

  • No dependencies — Single HTML file, no npm, no bundler

  • Offline-ready — Models cached after first load

  • Privacy-first — No tracking, no analytics, no data collection


Use Cases

  • Receipts and invoices
  • Scanned documents
  • Business cards
  • Screenshots with text
  • Handwritten notes
  • Historical documents

How to Use

  1. Open the app in your browser
  2. Choose an engine:
    • Default for the safest first try in-browser
    • Fast for receipts, documents, screenshots
    • Handwriting for handwritten notes
    • Smart for an alternate transformer-based read on printed text
    • Qwen for experimental visual reading of complex documents
  3. Pick a Tesseract language if you're using the default engine
  4. Drop an image or click to upload
  5. Wait for text extraction
  6. Copy the extracted text

Running Locally

Option 1: Direct file open

# Just open index.html in your browser
open ScanLocal/index.html

Option 2: Local server (recommended)

cd ScanLocal
python3 -m http.server 8000
# Open http://localhost:8000

Option 3: GitHub Pages

The app is automatically deployed to:
https://naklitechie.github.io/ScanLocal/


Tech Stack

Component Technology
Framework Vanilla JS (zero dependencies)
OCR Engine 1 PaddleOCR (PP-OCRv5) via Paddle.js
OCR Engine 2 TrOCR via Transformers.js
OCR Engine 3 Tesseract.js v5
OCR Engine 4 Printed-text transformer OCR via Transformers.js
OCR Engine 5 Qwen2-VL via Transformers.js / ONNX
UI Custom CSS (NakliTechie design system)
Hosting GitHub Pages

Model Sizes & Performance

Engine Size First Load Cached Best For
Tesseract ~25 MB/lang 5-10 sec Instant Default browser OCR, broad language support
PaddleOCR ~20 MB 2-5 sec Instant Printed text
TrOCR ~500 MB 30-60 sec Instant Handwriting
Smart OCR ~300-500 MB 20-60 sec Instant Alternate printed-text extraction
Qwen2-VL ~1.8 GB+ Long / may time out Cached after load Complex visual reasoning and document reading

Supported Formats

  • Images: PNG, JPG/JPEG, WebP
  • Max size: 10 MB per image
  • Languages: 100+ (varies by engine)
  • Tesseract languages in UI: English, Spanish, French, German, Italian, Portuguese, Russian, Hindi, Arabic, Japanese, Korean, Chinese (Simplified)

Palette

Coloured with italy-02 · PERGAMENA — Renaissance manuscript parchment, Vatican library. Yellowed-cream body, ink-brown text, Vatican-cobalt action accent — the place where text is preserved and read, fitting for an OCR tool.

Palette pulled from Rangrez, the global colour-palette library that backs all NakliTechie projects.


Privacy

ScanLocal is part of the NakliTechie series — browser-native tools that keep your data on your device.

  • ❌ No server uploads
  • ❌ No API keys
  • ❌ No tracking
  • ❌ No analytics
  • ✅ All processing happens in your browser
  • ✅ Models cached locally after first load

License

MIT License — see LICENSE file.


Part of NakliTechie Series

ScanLocal follows the same blueprint as:

All single-file, zero-dependency, browser-native apps.


Development

Project Structure

ScanLocal/
├── index.html          # The entire app (shipped)
├── README.md           # This file (shipped)
├── LICENSE             # MIT license (shipped)
├── .gitignore          # Git ignore rules
└── PLAN.md             # Planning notes (not shipped)

Build Commands

None! Just edit index.html and refresh your browser.

Deploy to GitHub Pages

# Enable GitHub Pages on the repository
# Settings → Pages → Source: main branch → /root
# Push to main branch
git push origin main

Contributing

Issues and PRs welcome! Please read IDEAS.md for the series guidelines before contributing.


Author

Chirag Patnaik
NakliTechie


TODOs (Roadmap)

See PLAN.md for detailed implementation plan.

Phase 1 (Current): Core infrastructure ✅

  • Base HTML skeleton
  • Engine selector UI (4 engines)
  • Dropzone + preview
  • PaddleOCR integration
  • TrOCR integration
  • Tesseract.js integration
  • Smart OCR integration

Phase 2: Engine integration Phase 3: UI polish Phase 4: Testing Phase 5: Launch

About

OCR — extract text from images and PDFs, fully private and local.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages