Note
Industry: Automotive (EV batteries)
Role: Tech Lead & AI Engineer
Impact Metrics
- Analysts reallocated from manual tagging to higher-value trend interpretation
- -90% review/labeling time per weekly batch (from 5 days β 0.5 day)
- β β¬490k/year cost savings (based on 10 IP engineers' time saved)
IP engineers were manually scanning and tagging multilingual patents - slow, inconsistent, and hard to replicate at scale. Multilingual content and unstructured abstracts made it difficult to compare filings and report consolidated trends to innovation stakeholders.
Built REST endpoints for language classification β translation β domain-specific embedding β unsupervised clustering β cluster summarization β trend tracking, exposed via a FastAPI service on Azure Databricks and backed by a vector index for semantic lookups. Clusters receive concise titles/summaries and can optionally align to IPC categories for consistent reporting.
Baseline EV-battery patent trend analytics high-level solution architecture
- Cloud: Microsoft Azure Cloud Infrastructure
- CI/CD: Azure DevOps Pipelines
- Containerization: Docker
- Data Platform: Azure Databricks (APIs for ingestion & jobs)
- Vector Index: Databricks Vector Search
- Backend: Python services with FastAPI (REST)
- Language Detection: XLM-roberta
- Translation: mBART-large-50 many to one
- Embeddings: BERT-for-patents (fine-tuned for EV-battery patents)
- LLM (summarization): Mistral 7B
- Multilingual filings: Used XLM-RoBERTa + mBART-50 to standardize language before embedding.
- Noisy abstracts & jargon: Fine-tuned domain embeddings (BERT for Patents) to boost semantic cohesion before clustering.
- Inconsistent labels: Auto-titled clusters with Mistral-7B; optionally mapped to IPC-labels.
- Scalability & repeatability: Orchestrated Databricks jobs with containerized services and Azure DevOps CI/CD for stable weekly runs.
- Analyst adoption: Added concise summaries + trends dashboard, shifting effort from manual tagging to interpretation.
- Earlier visibility of emerging battery-tech themes; analysts redeployed to higher-value analysis.
- Analysts reallocated from manual tagging to higher-value trend interpretation
- -90% review/labeling time per weekly batch (from 5 days β 0.5 day)
- β β¬490k/year cost savings (based on 10 IP engineers' time saved)