Data Scientist · Machine Learning · Decision Intelligence

Turning data into reliable models, clear insights, and decision-support tools.

I’m Amir Honardoust, a Data Scientist focused on explainable machine learning, forecasting, NLP, analytics, and practical AI systems people can understand and use.

View featured work GitHub LinkedIn

Risk ML RAG Systems Synthetic Data Recommenders

amir.profile = {
  role: "Data Scientist",
  focus: [
    "risk modeling",
    "RAG systems",
    "synthetic data",
    "recommenders"
  ],
  output: "decisions"
}

ML models with measurable evaluation

AI RAG, NLP, and applied systems

BI dashboards and decision tools

DX reproducible, readable projects

What I do

Data science that moves from analysis to action.

My work connects statistical thinking, machine learning, product sense, and clear communication. I care about models that are evaluated, explainable, and useful beyond a notebook.

Risk Modeling & Decision Safety

Underwriting and fraud-risk workflows with calibration, threshold policies, abstention, validation, explainability, and review-focused reporting.

RAG, NLP & Recommendation Systems

Retrieval pipelines, knowledge-graph augmentation, text classification, recommender evaluation, and AI systems built for traceability.

Synthetic Data & Applied ML

Synthetic tabular-data evaluation, business prediction tools, reproducible model workflows, dashboard outputs, and portfolio-grade documentation.

Featured projects

Selected proof of work.

Explore the technical lab ↗

Risk ML · Calibration · Abstention

GitHub ↗

Underwriting Decision Safety Lab

Built a decision-safety workflow for underwriting-style model review with corrected calibration, abstention policies, data validation, policy variants, slice safety reporting, and CI.

Problem

High-stakes models need probability quality, review policies, and clear limits.

Method

Combined calibrated probabilities, confidence-based abstention, validation, and slice reporting.

Proves

Decision-safety thinking, evaluation discipline, and responsible ML communication.

Pythonscikit-learnCalibrationStreamlit

Fraud Risk · Explainability

GitHub ↗

Financial Fraud Risk Engine

Developed a fraud-risk workflow with validation, cost-sensitive threshold search, policy artifacts, reason codes, SHAP explainability, and a Streamlit review dashboard.

Problem

Fraud detection requires balancing missed fraud, false positives, and review workload.

Method

Used probability scoring, threshold policies, baselines, dashboard triage, and reason codes.

Proves

Risk-modeling workflow design, explainability, and operational ML thinking.

Risk MLSHAPThresholds

RAG · Knowledge Graphs

GitHub ↗

Graph-RAG Engine

Built a retrieval-augmented question-answering system that combines vector retrieval, graph expansion, citation-aware context, optional LLM answers, FastAPI, and Streamlit.

Problem

Simple RAG systems can miss connected context and produce weak traceability.

Method

Combined semantic search with graph relationships, evaluation tests, and API contracts.

Proves

Applied AI architecture, retrieval evaluation, and explainable answer generation.

RAGFAISSFastAPIStreamlit

Synthetic Data · Evaluation

GitHub ↗

Synthetic Data Artist

Compared synthetic tabular data generators using realism, distribution overlap, correlation preservation, privacy proxies, utility metrics, and visual diagnostics.

Problem

Synthetic data needs evidence of quality, not just generated rows.

Method

Evaluated Copula and VAE outputs with metrics, plots, reports, and validation checks.

Proves

Statistical evaluation maturity, data-quality analysis, and reproducible ML tooling.

VAECopulaEvaluation

Recommendations · Evaluation

GitHub ↗

Movie Recommendation System

Built a recommender-system demo with content-based filtering, corrected SVD scoring, hybrid blending, baseline comparisons, alpha sweep, tests, and structured outputs.

Problem

Recommendation demos need honest baselines and interpretable outputs.

Method

Compared content, collaborative, hybrid, random, popularity, and Bayesian baselines.

Proves

Recommendation evaluation, ranking metrics, reproducibility, and testing discipline.

RecommendersSVDNDCG

NLP · Classification

GitHub ↗

Fake News Detector

Built an NLP classification pipeline for detecting unreliable news text using preprocessing, vectorization, model comparison, evaluation reporting, and clean documentation.

Problem

Text classification needs transparent features and careful validation.

Method

Used NLP preprocessing, supervised learning, metric comparison, and reporting.

Proves

End-to-end ML workflow, NLP fundamentals, and evaluation discipline.

NLPTF-IDFClassification

Business ML · Forecasting

GitHub ↗

Coffee Shop Profit Predictor

Created a business-focused profit prediction project with validation, baseline comparison, model selection, candidate scoring, risk notes, tests, CI, and clear output artifacts.

Problem

Business prediction projects need realistic evaluation and decision-ready outputs.

Method

Combined regression modeling, baseline checks, cross-validation, and candidate ranking.

Proves

Applied ML workflow, model evaluation, and business-oriented communication.

RegressionBusiness MLCI

More experiments live on honardoust.codes

Technical notes, project breakdowns, reproducible workflows, and deeper implementation details.

Visit technical lab

About

A Data Scientist with a builder’s mindset.

I focus on practical data science: understanding the problem, shaping the data, building the right model, evaluating it honestly, and communicating the result clearly.

My strongest interests are risk modeling, retrieval-augmented generation, synthetic data evaluation, recommender systems, explainability, and analytics systems that help people make better decisions.

“Good data science is not just a model. It is a reliable path from messy evidence to a decision someone can trust.”

Skills

Tools and capabilities.

Data Science

Python, SQL, pandas, NumPy, statistics, exploratory analysis, feature engineering.

Machine Learning

Classification, regression, forecasting, validation, metrics, error analysis, interpretation.

AI / NLP

Text classification, transformers, RAG, semantic search, embeddings, grounded AI systems.

Visualization

Plotly, Streamlit, dashboards, storytelling, KPI reporting, decision-support interfaces.

Engineering

Git, APIs, FastAPI, reproducible pipelines, documentation, clean project structure.

Communication

Translating models, uncertainty, tradeoffs, and results for technical and business audiences.

Contact

Open to Data Science roles, collaborations, and applied ML projects.

The fastest way to reach me is through LinkedIn or GitHub. For technical details, visit my lab at honardoust.codes.

Connect on LinkedIn View GitHub Technical Lab