Skip to content

pandego/patent-trend-analysis-using-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”Ž Automated Patent Trend Intelligence for EV Batteries


Note

Industry: Automotive (EV batteries)
Role: Tech Lead & AI Engineer

Impact Metrics

  • Analysts reallocated from manual tagging to higher-value trend interpretation
  • -90% review/labeling time per weekly batch (from 5 days β†’ 0.5 day)
  • β‰ˆ €490k/year cost savings (based on 10 IP engineers' time saved)

🧩 The Challenge

IP engineers were manually scanning and tagging multilingual patents - slow, inconsistent, and hard to replicate at scale. Multilingual content and unstructured abstracts made it difficult to compare filings and report consolidated trends to innovation stakeholders.

πŸ’‘ The Solution

β†’ Implementation βš™οΈ

Built REST endpoints for language classification β†’ translation β†’ domain-specific embedding β†’ unsupervised clustering β†’ cluster summarization β†’ trend tracking, exposed via a FastAPI service on Azure Databricks and backed by a vector index for semantic lookups. Clusters receive concise titles/summaries and can optionally align to IPC categories for consistent reporting.

β†’ Solution Architecture πŸ—οΈ

Architecture Diagram

Baseline EV-battery patent trend analytics high-level solution architecture

β†’ Tech Stack 🧰

  • Cloud: Microsoft Azure Cloud Infrastructure
  • CI/CD: Azure DevOps Pipelines
  • Containerization: Docker
  • Data Platform: Azure Databricks (APIs for ingestion & jobs)
  • Vector Index: Databricks Vector Search
  • Backend: Python services with FastAPI (REST)
  • Language Detection: XLM-roberta
  • Translation: mBART-large-50 many to one
  • Embeddings: BERT-for-patents (fine-tuned for EV-battery patents)
  • LLM (summarization): Mistral 7B

πŸ“š Key Learnings

  • Multilingual filings: Used XLM-RoBERTa + mBART-50 to standardize language before embedding.
  • Noisy abstracts & jargon: Fine-tuned domain embeddings (BERT for Patents) to boost semantic cohesion before clustering.
  • Inconsistent labels: Auto-titled clusters with Mistral-7B; optionally mapped to IPC-labels.
  • Scalability & repeatability: Orchestrated Databricks jobs with containerized services and Azure DevOps CI/CD for stable weekly runs.
  • Analyst adoption: Added concise summaries + trends dashboard, shifting effort from manual tagging to interpretation.

πŸ“Š Measurable Impact

  • Earlier visibility of emerging battery-tech themes; analysts redeployed to higher-value analysis.
  • Analysts reallocated from manual tagging to higher-value trend interpretation
  • -90% review/labeling time per weekly batch (from 5 days β†’ 0.5 day)
  • β‰ˆ €490k/year cost savings (based on 10 IP engineers' time saved)

About

End-to-end AI pipeline for classifying, translating, clustering, and summarizing patent filings to surface EV-battery technology trends for IP and innovation teams.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors