Pradeep Kalluri

About Me

I'm a Data Engineer with hands-on experience designing, developing, and deploying scalable data pipelines for analytics and business intelligence across enterprise organizations.

Currently at NatWest Bank, I work across modern data platforms building reliable data flows using Kafka, PySpark, Snowflake, and Airflow.

With experience spanning financial services and consulting, I've delivered data engineering solutions across cloud data platforms, real-time streaming systems, and advanced analytics environments.

☁️

Cloud Platforms

Azure Databricks, AWS, Snowflake, Microsoft Fabric

⚡

Data Pipelines

Kafka, PySpark, Airflow orchestration

🔧

ETL/ELT Solutions

Python, SQL, dbt, Azure Data Factory

📊

Analytics & BI

Tableau, Power BI, SAP BW integration

🎓 Education

MBA — Master in Business Administration

York St John University, London, UK

2023 – 2024

Bachelor of Internet & Communication Technology

Tor Vergata University, Rome, Italy

2020 – 2023

📍 Location

London, United Kingdom

🚀 Currently

Building production data pipelines at NatWest Bank
Writing about data engineering on Medium (71K+ views)
Contributing to Apache Airflow & dbt-core
Speaking at data engineering meetups
Pursuing UK Global Talent Visa in Digital Technology

Experience

Data Engineer

NatWest Bank

Sep 2025 – Present Greater London, UK

Retail Banking Data Quality and Pipeline Engineering. Building production data platforms processing millions of transactions daily.

Design and implement real-time data ingestion pipelines using Kafka and Amazon S3
Develop distributed data processing workflows with PySpark for transformation and validation
Build curated data models in Snowflake supporting downstream analytics and reporting
Orchestrate end-to-end pipelines using Apache Airflow DAGs with dependency management
Optimize SQL and PySpark workloads for performance and cost-efficiency

Kafka PySpark Amazon S3 Snowflake Airflow Tableau

Data Engineer

Accenture

Jul 2023 – Aug 2025

Enterprise Data Platform Modernisation. Delivered large-scale cloud data engineering solutions for Fortune 500 clients across multiple industries.

Designed and implemented scalable ETL/ELT pipelines using Azure Databricks, ADF, and Snowflake
Developed PySpark workflows for distributed data processing and transformation
Built reusable transformation layers with dbt ensuring consistent business logic
Automated pipeline deployments using CI/CD (GitHub Actions, Terraform)
Leveraged Microsoft Fabric for unified analytics workflows and lakehouse architecture

Azure Databricks Snowflake dbt Microsoft Fabric Terraform PySpark

Data Engineer

Dpoint Group

May 2022 – Jun 2023 Barcelona, Spain

SAP BW to Azure Migration & Power BI Reporting Modernisation. Developed BI and analytics solutions for manufacturing and logistics operations.

Developed and maintained ETL processes using SSIS to extract data from SAP BW
Created interactive Power BI dashboards for executive insights and KPI monitoring
Automated recurring reporting workflows using Python and Excel VBA
Supported migration of on-premise ETL processes to Azure Data Factory

SSIS SAP BW Power BI Azure Data Factory Python

Technical Skills

Python

PySpark, Pandas

SQL

T-SQL, PL/SQL

Shell

Bash Scripting

AWS

S3, Glue, Lambda

Azure

Databricks, ADF

Fabric

Microsoft Fabric

Kafka

Streaming & Events

Airflow

DAG Orchestration

Snowflake

Cloud Data Warehouse

PySpark

Distributed Processing

dbt

Transform Layer

PostgreSQL

Relational DB

MySQL

Relational DB

Redshift

Azure SQL

Docker

Containerization

Terraform

IaC

CI/CD

GitHub Actions

Tableau

Data Visualization

Power BI

Dashboards & KPIs

Projects

Production

Real-Time Data Pipeline Platform

NatWest Bank

Building production-grade data pipelines processing millions of transactions daily with real-time streaming and automated quality frameworks.

10K+ events/sec

40+ DAGs

6h → 30min recovery

Kafka PySpark Snowflake Airflow S3

Click for case study →

Production

Enterprise Cloud Data Platform

Accenture

Delivered large-scale cloud platforms for Fortune 500 clients using Azure Databricks, Snowflake, and Microsoft Fabric with data mesh architecture.

1000+ users

70% faster deploy

Azure Databricks Snowflake Fabric dbt Terraform

Click for case study →

Production

Business Intelligence Platform

Dpoint Group

Developed BI and analytics solutions for manufacturing and logistics operations, automating 30+ manual reporting processes.

30+ reports automated

Days → Hours time saved

SSIS Power BI SAP BW Azure Data Factory

Click for case study →

Open Source

Real-Time Data Quality Monitor

✅ Production-Ready

ML-powered real-time data quality monitoring system detecting anomalies in streaming data with sub-10ms latency using Isolation Forest.

332K+ orders

93% quality

<10ms latency

Kafka Spark Streaming scikit-learn PostgreSQL

Click for case study →

Open Source

Modern ETL / Data Platform

✅ Open Source

A full modern data engineering platform built from scratch. Cost-effective alternative to commercial tools, potentially saving companies £100K+ annually.

1K+ orders

£100K+ savings

Airflow Kafka dbt Spark Docker

Click for case study →

Open Source

E-commerce Data Pipeline

End-to-end data pipeline demonstrating modern data engineering practices with PySpark, Airflow, dbt, and comprehensive testing.

PySpark Airflow dbt Docker

Click for case study →

Certifications

🏅

Microsoft Fabric Data Engineer Associate

Microsoft

January 2026 Verify Credential →

❄️

SnowPro Core (COF-C03)

Snowflake

Score: 923 / 1000

February 2026 Verify Credential →

📚

AWS Solutions Architect

AWS

In Progress

📚

Azure Data Engineer Associate

Microsoft

In Progress

Writing & Speaking

📝 Technical Writing

71,000+ views across platforms

HackerNoon ⭐

🎤 Speaking Engagements

✅ Completed

Oxford Microsoft Data Platform Group

Building Production Data Pipelines That Scale

February 2026

📋 Conference Proposals

13 proposals submitted to data engineering conferences across Europe

🧑‍🏫 Mentoring & Community

Active mentor on Topmate (Top 5% Mentor) and internal training lead at NatWest Group. Helping aspiring data engineers transition into the field.

Open Source Contributions

Apache Airflow

6+ Merged PRs + 15+ Reviews

✅ MERGED FastAPI: Lazy initialize flask_app in FAB #64908 ✅ MERGED Spark: add post_submit_commands to SparkSubmitHook #64391 ✅ MERGED Worker: ensure Celery tasks are registered at startup #63110 ✅ MERGED Bug fix contribution: migration stability #61005 ✅ MERGED Pool name validation fix #59938

dbt-core

5+ Merged PRs

✅ MERGED Fix: correct deprecation for top-level config #12618 ✅ MERGED Selection: use tuple membership check #12562 ✅ MERGED Fixed @requires.catalogs for compile #12388 ✅ MERGED dbt init UX: add directory change instruction #12232 🟡 Under Review Debug compilation error fix #12502

Confluent Kafka Python

1 Submitted

🟢 Submitted SSL configuration enhancement

What People Say

★★★★★

"Pradeep delivered an exceptional presentation on production data pipelines. His deep technical knowledge and ability to explain complex concepts clearly made a lasting impression on our audience."

Microsoft Senior Cloud Solution Architect

Oxford Microsoft Data Platform Group

★★★★★

"An outstanding data engineer who combines strong technical skills with excellent communication. His article on data quality reached 71,000+ engineers for a reason — he writes with clarity and real-world experience that resonates with practitioners."

Data Engineering Community

Reddit r/dataengineering

★★★★★

"Pradeep's contributions to Apache Airflow demonstrate deep understanding of the codebase. His PRs are well-documented, thoroughly tested, and address real user pain points. A valuable contributor to the open source community."

Apache Airflow Maintainers

Open Source Community

★★★★★

"Pradeep's mentoring sessions on Topmate dramatically accelerated my career transition into data engineering. His practical approach and willingness to share real-world examples set him apart from other mentors."

Aspiring Data Engineer

Topmate Mentee

GitHub Activity

-- Public Repos

-- Followers

-- Following

Contribution Graph

Last 52 weeks of activity

Less

More

View Full Profile on GitHub →

Get in Touch

Passionate about building reliable, scalable data platforms that empower data-driven decision making.

📧

Email

kalluripradeep99@gmail.com

Click to copy

💼

About Me

Cloud Platforms

Data Pipelines

ETL/ELT Solutions

Analytics & BI

🎓 Education

MBA — Master in Business Administration

Bachelor of Internet & Communication Technology

📍 Location

🚀 Currently

Experience

Data Engineer

Data Engineer

Data Engineer

Technical Skills

Python

SQL

Shell

AWS

Azure

Fabric

Kafka

Airflow

Snowflake

PySpark

dbt

PostgreSQL

MySQL

Redshift

Docker

Terraform

CI/CD

Tableau

Power BI

Projects

Real-Time Data Pipeline Platform

Enterprise Cloud Data Platform

Business Intelligence Platform

Real-Time Data Quality Monitor

Modern ETL / Data Platform

E-commerce Data Pipeline

Certifications

Microsoft Fabric Data Engineer Associate

SnowPro Core (COF-C03)

AWS Solutions Architect

Azure Data Engineer Associate

Writing & Speaking

📝 Technical Writing

Real-Time Data Quality Monitor: A Production ML Approach

The Weekend Our Pipeline Processed the Same Data 47 Times

How I Fixed a Silent Production Bug in Apache Airflow

A Beginner's Guide to Contributing to Apache Airflow

🎤 Speaking Engagements

Oxford Microsoft Data Platform Group

📋 Conference Proposals

🧑‍🏫 Mentoring & Community

Open Source Contributions

Apache Airflow

dbt-core

Confluent Kafka Python

What People Say

Microsoft Senior Cloud Solution Architect

Data Engineering Community

Apache Airflow Maintainers

Aspiring Data Engineer

GitHub Activity

Contribution Graph

Get in Touch

Email

LinkedIn

GitHub

Medium

Dev.to

Topmate

Pradeep Kalluri

Professional Summary

Experience

Core Skills

Education