Drawing and Cad files upload page
Processing: Layer 1 - Data Extraction
Processing: Layer 2 - Data Enrichment
Processing: Layer 3 - Step File Parsing
Processing: Layer 4 - Correlation and Results
System Predicted Output
Validation of Actual Output with FSI(expected output)

Manukai AG

Bridging 2D and 3D: Generative AI for Manufacturing Intelligence

Inspiration

Modern manufacturing still depends heavily on 2D engineering drawings, even when 3D CAD models are available. Critical product and manufacturing information (PMI)—such as hole types, thread specifications, and tolerances—exists only as visual annotations in PDFs.

This creates a major bottleneck: engineers must manually interpret drawings and map them to 3D geometry. This process is slow, error-prone, and does not scale.

We were inspired to bridge this gap and move towards a true digital manufacturing pipeline, where design intent becomes directly machine-readable.

What it does

This project automatically:

Extracts hole and thread annotations from PDF drawings
Understands manufacturing intent (dimensions, tolerances, threads)
Maps them to corresponding 3D features in STEP models
Outputs structured, machine-readable JSON with confidence and traceability

How we built it

We designed a four-layer hybrid pipeline that combines the strengths of traditional document intelligence with vision-language models and geometric reasoning:

Azure Document Intelligence for OCR with precise bounding boxes
LLM multi-pass vision extraction with OCR regex pre-scanning, verification sweeps, cropped-region analysis, and OCR-vs-LLM reconciliation
Pure-Python STEP parser that resolves ISO 10303-21 entity chains to extract cylindrical surfaces and group them into logical holes
Hybrid deterministic + LLM correlation that uses diameter matching, count filtering, and counterbore pairing for unambiguous cases, with targeted LLM disambiguation for edge cases

Every layer is designed to fail gracefully — if Azure DI is unavailable, the pipeline continues in vision-only mode; ambiguous matches are flagged with lower confidence rather than forced.

Challenges we ran into

GD&T is not natural language. Engineering drawing annotations use specialised symbols (∅, ↧, feature control frames) that OCR engines frequently misread. We solved this with a regex pre-scan that creates a "ground-truth checklist" injected into every LLM prompt.
Same diameter, different holes. Real drawings often have multiple groups of identically-sized holes in different locations. Distinguishing them requires cross-referencing datum references, position tolerances, and spatial context.
Deduplication across multi-pass extraction. Our multi-pass approach (per-page + verification + cropped-region + reconciliation) greatly improves recall but introduces duplicates. We implemented three-stage deduplication (within-page, cross-page, diameter-based) to balance recall and precision.
STEP files without a CAD kernel. We built a pure-Python STEP parser to avoid heavy compiled dependencies, which required resolving complex entity reference chains.

Accomplishments that we're proud of

Built a full end-to-end pipeline (PDF → STEP → JSON)
Achieved high recall using multi-pass extraction strategy
Designed a lightweight STEP parser without CAD libraries
Enabled traceable, explainable AI outputs
Balanced deterministic logic with LLM reasoning effectively

What we learned

Hybrid AI systems outperform pure LLM approaches in industrial problems
Multi-pass + verification significantly improves extraction accuracy
Geometry + language understanding is a powerful combination
Explainability (confidence + evidence) is critical for trust

What's next for BreakThrough

Custom Azure DI model trained on engineering drawings for direct GD&T symbol recognition
Two-stage extraction (detect all diameters first, then interpret each one)
Spatial clustering using 3D positions from STEP to disambiguate same-diameter hole groups
Confidence-based filtering using STEP diameters as ground truth to prune false positive annotations
View-aware extraction that processes each drawing view (SECTION A-A, DETAIL B) as an independent context