Manukai AG
Bridging 2D and 3D: Generative AI for Manufacturing Intelligence
Inspiration
Modern manufacturing still depends heavily on 2D engineering drawings, even when 3D CAD models are available. Critical product and manufacturing information (PMI)—such as hole types, thread specifications, and tolerances—exists only as visual annotations in PDFs.
This creates a major bottleneck: engineers must manually interpret drawings and map them to 3D geometry. This process is slow, error-prone, and does not scale.
We were inspired to bridge this gap and move towards a true digital manufacturing pipeline, where design intent becomes directly machine-readable.
What it does
This project automatically:
- Extracts hole and thread annotations from PDF drawings
- Understands manufacturing intent (dimensions, tolerances, threads)
- Maps them to corresponding 3D features in STEP models
- Outputs structured, machine-readable JSON with confidence and traceability
How we built it
We designed a four-layer hybrid pipeline that combines the strengths of traditional document intelligence with vision-language models and geometric reasoning:
- Azure Document Intelligence for OCR with precise bounding boxes
- LLM multi-pass vision extraction with OCR regex pre-scanning, verification sweeps, cropped-region analysis, and OCR-vs-LLM reconciliation
- Pure-Python STEP parser that resolves ISO 10303-21 entity chains to extract cylindrical surfaces and group them into logical holes
- Hybrid deterministic + LLM correlation that uses diameter matching, count filtering, and counterbore pairing for unambiguous cases, with targeted LLM disambiguation for edge cases
Every layer is designed to fail gracefully — if Azure DI is unavailable, the pipeline continues in vision-only mode; ambiguous matches are flagged with lower confidence rather than forced.
Challenges we ran into
- GD&T is not natural language. Engineering drawing annotations use specialised symbols (∅, ↧, feature control frames) that OCR engines frequently misread. We solved this with a regex pre-scan that creates a "ground-truth checklist" injected into every LLM prompt.
- Same diameter, different holes. Real drawings often have multiple groups of identically-sized holes in different locations. Distinguishing them requires cross-referencing datum references, position tolerances, and spatial context.
- Deduplication across multi-pass extraction. Our multi-pass approach (per-page + verification + cropped-region + reconciliation) greatly improves recall but introduces duplicates. We implemented three-stage deduplication (within-page, cross-page, diameter-based) to balance recall and precision.
- STEP files without a CAD kernel. We built a pure-Python STEP parser to avoid heavy compiled dependencies, which required resolving complex entity reference chains.
Accomplishments that we're proud of
- Built a full end-to-end pipeline (PDF → STEP → JSON)
- Achieved high recall using multi-pass extraction strategy
- Designed a lightweight STEP parser without CAD libraries
- Enabled traceable, explainable AI outputs
- Balanced deterministic logic with LLM reasoning effectively
What we learned
- Hybrid AI systems outperform pure LLM approaches in industrial problems
- Multi-pass + verification significantly improves extraction accuracy
- Geometry + language understanding is a powerful combination
- Explainability (confidence + evidence) is critical for trust
What's next for BreakThrough
- Custom Azure DI model trained on engineering drawings for direct GD&T symbol recognition
- Two-stage extraction (detect all diameters first, then interpret each one)
- Spatial clustering using 3D positions from STEP to disambiguate same-diameter hole groups
- Confidence-based filtering using STEP diameters as ground truth to prune false positive annotations
- View-aware extraction that processes each drawing view (SECTION A-A, DETAIL B) as an independent context
Detailed Architecture
Built With
- azuredocumentintelligence
- llm
- openai
- pydantic
- pymupdf
- python
- streamlit
Log in or sign up for Devpost to join the conversation.