Skip to content

imaad666/RealTime-Pipeline

Repository files navigation

Real-Time Data Processing Pipeline

A comprehensive real-time data processing pipeline built with Python, Apache Kafka, and Redis.

Features

  • Kafka Producer: Generates and ingests real-time data
  • Kafka Consumer: Processes and transforms incoming data
  • Data Aggregation: Real-time aggregation and analytics
  • Redis Caching: Fast caching of processed results
  • Web Dashboard: Real-time visualization of data flow and metrics
  • Monitoring: System health and performance monitoring

Project Structure

β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ kafka_config.py      # Kafka configuration
β”‚   └── redis_config.py      # Redis configuration
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ data_models.py       # Pydantic data models
β”‚   └── metrics.py           # Metrics and monitoring models
β”œβ”€β”€ producers/
β”‚   β”œβ”€β”€ data_producer.py     # Kafka producer for data ingestion
β”‚   └── sensor_producer.py   # Simulated sensor data producer
β”œβ”€β”€ consumers/
β”‚   β”œβ”€β”€ data_consumer.py     # Kafka consumer for data processing
β”‚   └── aggregator.py        # Data aggregation consumer
β”œβ”€β”€ processors/
β”‚   β”œβ”€β”€ data_processor.py    # Data transformation logic
β”‚   └── aggregator.py        # Aggregation logic
β”œβ”€β”€ cache/
β”‚   └── redis_client.py      # Redis caching layer
β”œβ”€β”€ dashboard/
β”‚   β”œβ”€β”€ app.py              # Dash web application
β”‚   β”œβ”€β”€ components.py       # Dashboard components
β”‚   └── static/            # Static assets
β”œβ”€β”€ monitoring/
β”‚   β”œβ”€β”€ metrics.py          # Prometheus metrics
β”‚   └── health_check.py     # Health monitoring
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ logger.py           # Logging configuration
β”‚   └── helpers.py          # Utility functions
└── scripts/
    β”œβ”€β”€ start_kafka.py      # Kafka setup script
    β”œβ”€β”€ start_redis.py      # Redis setup script
    └── run_pipeline.py     # Main pipeline runner

Quick Start

  1. Install Dependencies:

    pip install -r requirements.txt
  2. Start Kafka and Redis:

    python scripts/start_kafka.py
    python scripts/start_redis.py
  3. Run the Pipeline:

    python scripts/run_pipeline.py
  4. Access Dashboard: Open http://localhost:8050 in your browser

Components

Data Producer

  • Generates simulated sensor data (temperature, humidity, pressure)
  • Publishes to Kafka topics with configurable frequency
  • Supports multiple data types and formats

Data Consumer

  • Consumes data from Kafka topics
  • Performs real-time data transformation
  • Calculates aggregations (min, max, avg, count)
  • Stores results in Redis cache

Data Processor

  • Transforms raw data into structured format
  • Applies business logic and validation
  • Handles data quality checks

Redis Cache

  • Stores processed results and aggregations
  • Provides fast access to historical data
  • Supports TTL for data expiration

Web Dashboard

  • Real-time data visualization
  • System metrics and health monitoring
  • Interactive charts and graphs
  • Responsive design with Bootstrap

Configuration

Edit the configuration files in config/ to customize:

  • Kafka broker settings
  • Redis connection parameters
  • Data processing rules
  • Dashboard appearance

Monitoring

The system includes comprehensive monitoring:

  • Prometheus metrics for system health
  • Real-time performance indicators
  • Error tracking and alerting
  • Resource usage monitoring

Technologies Used

  • Python 3.8+: Core programming language
  • Apache Kafka: Message streaming platform
  • Redis: In-memory data store and cache
  • FastAPI: Web framework for APIs
  • Dash: Interactive web dashboard
  • asyncio: Asynchronous programming
  • Pydantic: Data validation and serialization

About

πŸš€ RealTime-Pipeline A powerful toolkit for building real-time data pipelines! πŸ› οΈ Handles data ingestion, processing, and storageβ€”lightning fast! 🌐 Ideal for streaming applications and big data workflows. πŸ”— Easily integrate with modern data tools and scalable cloud platforms. πŸ“Š Perfect for data engineers, analytics pros, and real-time dashboard

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages