Skip to content

rohansen856/rmf-simulator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

47 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RMF Monitor III Data Simulator

License: MIT Python 3.11+ Docker Kubernetes FastAPI Prometheus Grafana

A production-ready z/OS mainframe metrics simulator that generates realistic IBM RMF (Resource Measurement Facility) Monitor III data with authentic workload patterns, comprehensive storage backends, and modern monitoring integration.

πŸš€ Key Features

Realistic Mainframe Simulation

  • Authentic z/OS Metrics: CPU (GP, zIIP, zAAP), memory (real/virtual/CSA), I/O, coupling facility, and network performance
  • Dynamic Workload Patterns: Peak hours, batch processing windows, seasonal variations, and workload-specific behaviors
  • Multi-LPAR Support: Simulates complete sysplex environments with different workload types (online, batch, mixed)
  • Proper Metric Relationships: Interdependent metrics that mirror real mainframe behavior

Multi-Storage Architecture

  • MySQL: Relational storage with optimized indexing for analytical queries
  • MongoDB: NoSQL document storage for flexible metric schemas and time-series data
  • S3-Compatible Storage: MinIO integration for long-term archival and data lake scenarios
  • Prometheus: Real-time metrics exposure with native integration

Production-Ready Deployment

  • Containerized: Docker and Kubernetes ready with comprehensive health checks
  • Scalable: Horizontal pod autoscaling and load balancing support
  • Monitoring Stack: Complete Prometheus + Grafana + Alertmanager integration
  • Security: Non-root containers, RBAC, network policies, and secure defaults

Enterprise Features

  • Comprehensive Alerting: Performance threshold monitoring with multi-channel notifications
  • Data Lifecycle Management: Automated cleanup, archival, and retention policies
  • Export Capabilities: CSV export, backup creation, and data migration tools
  • Dashboard Templates: Executive, operational, and troubleshooting views

πŸ“Š Supported Metrics

Metric Category Components Description
CPU GP, zIIP, zAAP processors Utilization percentages with realistic specialty engine patterns
Memory Real storage, virtual storage, CSA Memory consumption across different storage types
LDEV 3390, FlashCopy, Tape devices Storage device utilization and response times
CLPR Coupling facility links Service times and request rates for CF connectivity
MPB CICS, IMS, MQ, Batch queues Message processing rates and queue depths
Ports OSA, HiperSocket, FICON Network port utilization and throughput
Volumes SYSRES, WORK, USER, TEMP Volume utilization and IOPS metrics

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   FastAPI App   β”‚    β”‚   Prometheus    β”‚    β”‚     Grafana     β”‚
β”‚   (Simulator)   │◄──►│   (Metrics)     │◄──►│   (Dashboard)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β–Ό                       β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     MySQL       β”‚    β”‚    MongoDB      β”‚    β”‚  S3 (MinIO)     β”‚
β”‚  (Relational)   β”‚    β”‚  (Document)     β”‚    β”‚  (Object)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The simulator generates metrics every 15 seconds, storing them simultaneously across multiple storage backends while exposing real-time metrics via Prometheus endpoints.

πŸš€ Quick Start

Prerequisites

  • Docker and Docker Compose installed
  • 8GB+ RAM recommended for full stack
  • Ports: 3000 (Grafana), 8000 (Simulator), 9000 (MinIO), 9090 (Prometheus)

One-Command Deployment

# Clone the repository
git clone https://github.com/rohansen856/rmf-simulator.git
cd rmf-simulator

# Start the complete stack
docker-compose up -d

# Verify all services are running
docker-compose ps

Access Points

πŸ“ˆ Monitoring & Dashboards

Pre-built Dashboards

  • Executive Overview: High-level system performance and availability
  • Operational Dashboard: Real-time metrics for system operators
  • Troubleshooting View: Detailed metrics for performance analysis
  • Capacity Planning: Historical trends and growth projections

Sample Queries

# CPU utilization across all LPARs
avg(rmf_cpu_utilization_percent{cpu_type="general_purpose"}) by (lpar)

# Memory usage percentage
(rmf_memory_usage_bytes{memory_type="real_storage"} / (64 * 1024 * 1024 * 1024)) * 100

# I/O response time 95th percentile
histogram_quantile(0.95, rate(rmf_ldev_response_time_seconds_bucket[5m]))

πŸ”§ Configuration

Environment Variables

# Database Configuration
MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_DATABASE=rmf_monitoring
MYSQL_USER=rmf_user
MYSQL_PASSWORD=rmf_password

# MongoDB Configuration
MONGO_HOST=localhost
MONGO_PORT=27017
MONGO_DATABASE=rmf_monitoring
MONGO_USERNAME=rmf_user
MONGO_PASSWORD=rmf_password

# S3/MinIO Configuration
S3_ENDPOINT_URL=http://localhost:9000
S3_ACCESS_KEY=rmf_user
S3_SECRET_KEY=rmf_password123
S3_BUCKET_NAME=rmf-metrics

LPAR Configuration

Modify utils/config.py to customize your mainframe environment:

LPAR_CONFIGS = [
    LPARConfig("PROD01", 16, 64, "online", [8, 9, 10, 14, 15, 16]),
    LPARConfig("PROD02", 12, 48, "online", [8, 9, 10, 14, 15, 16]),
    LPARConfig("BATCH01", 8, 32, "batch", [22, 23, 0, 1, 2, 3, 4, 5]),
    LPARConfig("TEST01", 4, 16, "mixed", [9, 10, 11, 15, 16, 17]),
]

πŸ“š API Documentation

Health Endpoints

# Health check
GET /health

# Readiness probe
GET /ready

# Startup probe
GET /startup

Metrics Endpoints

# Prometheus metrics
GET /metrics

# System information
GET /system-info

# Current LPAR status
GET /system-info

Example Response

{
  "sysplex": "SYSPLEX01",
  "timestamp": "2024-01-15T10:30:00Z",
  "lpars": [
    {
      "name": "PROD01",
      "workload_type": "online",
      "cpu_capacity": 16,
      "memory_gb": 64,
      "current_load_factor": 1.4,
      "is_peak_hour": true
    }
  ]
}

πŸ”’ Security

Container Security

  • Non-root execution: All containers run as non-privileged users
  • Resource limits: CPU and memory constraints applied
  • Network isolation: Service-to-service communication only
  • Secret management: Environment variable configuration

Authentication

  • Database authentication: Separate service accounts for each storage backend
  • API security: Rate limiting and input validation
  • MinIO access: IAM-based bucket policies

πŸ“¦ Deployment Options

Docker Compose (Development)

docker-compose up -d

Kubernetes (Production)

kubectl apply -f k8s/

Local Development

poetry install
poetry run uvicorn app.main:app --reload

πŸ“Š Performance & Scaling

Resource Requirements

  • Minimum: 4 CPU cores, 8GB RAM
  • Recommended: 8 CPU cores, 16GB RAM
  • Storage: 100GB+ for historical data

Scaling Characteristics

  • Horizontal scaling: Multiple simulator instances supported
  • Auto-scaling: Kubernetes HPA based on CPU/memory metrics
  • Database scaling: Read replicas and sharding support

πŸ› οΈ Troubleshooting

Common Issues

  1. Port conflicts: Check if ports 3000, 8000, 9000, 9090 are available
  2. Memory issues: Ensure adequate RAM for all services
  3. Storage permissions: Verify Docker volume permissions

Debug Commands

# Check service logs
docker-compose logs rmf-simulator

# Verify metrics generation
curl http://localhost:8000/metrics

# Test database connectivity
docker-compose exec rmf-simulator python -c "from storage.mysql.service import DatabaseService; print(DatabaseService().get_connection_status())"

πŸ“ˆ Monitoring & Alerting

Built-in Alerts

  • High CPU utilization (>85% for 5 minutes)
  • Memory pressure (>90% for 2 minutes)
  • I/O response time (>50ms for 3 minutes)
  • Service unavailability (health check failures)

Custom Metrics

The simulator supports custom metric definitions and can be extended to simulate additional mainframe components.

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details on:

  • Setting up the development environment
  • Running tests
  • Submitting pull requests
  • Code style guidelines

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

πŸ™ Acknowledgments

  • IBM z/OS RMF documentation and metrics specifications
  • Prometheus and Grafana communities
  • FastAPI and modern Python ecosystem contributors

Note: This simulator is for educational and testing purposes. It generates synthetic data that resembles real mainframe metrics but should not be used for actual capacity planning or performance analysis of production systems.

About

Simulating RMF metrics using fastapi demi server and ingesting it into monitoring stacks

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •