Real-Time Data Processing Pipeline
A comprehensive real-time data processing pipeline built with Python, Apache Kafka, and Redis.
- Kafka Producer: Generates and ingests real-time data
- Kafka Consumer: Processes and transforms incoming data
- Data Aggregation: Real-time aggregation and analytics
- Redis Caching: Fast caching of processed results
- Web Dashboard: Real-time visualization of data flow and metrics
- Monitoring: System health and performance monitoring
βββ requirements.txt # Python dependencies
βββ config/
β βββ kafka_config.py # Kafka configuration
β βββ redis_config.py # Redis configuration
βββ models/
β βββ data_models.py # Pydantic data models
β βββ metrics.py # Metrics and monitoring models
βββ producers/
β βββ data_producer.py # Kafka producer for data ingestion
β βββ sensor_producer.py # Simulated sensor data producer
βββ consumers/
β βββ data_consumer.py # Kafka consumer for data processing
β βββ aggregator.py # Data aggregation consumer
βββ processors/
β βββ data_processor.py # Data transformation logic
β βββ aggregator.py # Aggregation logic
βββ cache/
β βββ redis_client.py # Redis caching layer
βββ dashboard/
β βββ app.py # Dash web application
β βββ components.py # Dashboard components
β βββ static/ # Static assets
βββ monitoring/
β βββ metrics.py # Prometheus metrics
β βββ health_check.py # Health monitoring
βββ utils/
β βββ logger.py # Logging configuration
β βββ helpers.py # Utility functions
βββ scripts/
βββ start_kafka.py # Kafka setup script
βββ start_redis.py # Redis setup script
βββ run_pipeline.py # Main pipeline runner
-
Install Dependencies:
pip install -r requirements.txt
-
Start Kafka and Redis:
python scripts/start_kafka.py python scripts/start_redis.py
-
Run the Pipeline:
python scripts/run_pipeline.py
-
Access Dashboard: Open http://localhost:8050 in your browser
- Generates simulated sensor data (temperature, humidity, pressure)
- Publishes to Kafka topics with configurable frequency
- Supports multiple data types and formats
- Consumes data from Kafka topics
- Performs real-time data transformation
- Calculates aggregations (min, max, avg, count)
- Stores results in Redis cache
- Transforms raw data into structured format
- Applies business logic and validation
- Handles data quality checks
- Stores processed results and aggregations
- Provides fast access to historical data
- Supports TTL for data expiration
- Real-time data visualization
- System metrics and health monitoring
- Interactive charts and graphs
- Responsive design with Bootstrap
Edit the configuration files in config/ to customize:
- Kafka broker settings
- Redis connection parameters
- Data processing rules
- Dashboard appearance
The system includes comprehensive monitoring:
- Prometheus metrics for system health
- Real-time performance indicators
- Error tracking and alerting
- Resource usage monitoring
- Python 3.8+: Core programming language
- Apache Kafka: Message streaming platform
- Redis: In-memory data store and cache
- FastAPI: Web framework for APIs
- Dash: Interactive web dashboard
- asyncio: Asynchronous programming
- Pydantic: Data validation and serialization