Production-style reference architecture for robotic surgery telemetry: ingest high-frequency events, process streams, build a lakehouse, and serve low-latency safety insights.
This project is intentionally designed to demonstrate the exact engineering surface area common in modern healthcare data platforms: Python + Java microservices, Kafka streaming, Spark + Iceberg, Cassandra/MongoDB/Elasticsearch storage patterns, AWS IaC, and testing discipline.
- Models a realistic operating-room telemetry workflow rather than a toy CRUD app.
- Uses a polyglot architecture: Python services for ingestion/indexing and a Java service for query APIs.
- Shows both batch-lakehouse thinking (Spark/Iceberg) and online serving (Cassandra/Elasticsearch/MongoDB).
- Includes practical engineering workflows: unit/integration tests, Docker, Kubernetes manifests, Terraform starter, and Prometheus metrics.
- Idempotent ingest path: replay-safe event handling keyed by
event_id. - Batch ingest endpoint: supports up to 1000 events per call with per-batch accounting.
- Stateful safety engine: composite risk scoring, high-force/high-latency/error triggers, and alert feeds.
- Risk analytics APIs: patient and procedure summaries (
avg risk,p95 latency,critical count). - Operational metrics: Prometheus counters for deduplication, alert emission, and query load.
- Analytical transforms: Spark-side safety scoring, force spike detection, and KPI aggregation helpers.
- Java analytical API: top critical alerts and patient summary endpoints for low-latency consumers.
| Decision | Why this choice | Tradeoff |
|---|---|---|
| Kafka for raw telemetry ingress | Durable replay, partition-key ordering per patient/procedure, easy fan-out to stream + serving paths | More infra and operational overhead than direct service-to-service ingestion |
| Spark + Iceberg for the lakehouse path | Open table format, incremental append model, strong interoperability with warehouse tooling | Higher startup/runtime cost than lightweight batch jobs |
| Polyglot serving stores (Cassandra + Elasticsearch + MongoDB) | Matches access patterns: timeline reads, text/error search, and flexible case summaries | More components to operate and maintain |
| Python for ingest/indexing + Java for query APIs | Fast iteration for data services plus JVM ecosystem for typed API services | Cross-language contracts require stronger API discipline |
| In-memory fallback backends in dev mode | Makes local demos and CI deterministic without requiring external clusters | Less representative than managed prod backends for performance profiling |
flowchart LR
A["Robotic telemetry\n(force, velocity, latency, step)"] --> B["Telemetry Gateway\nPython/FastAPI"]
B --> C["Kafka topic\nsurgical.telemetry.raw"]
C --> D["Spark Structured Streaming\nparse/enrich/risk scoring"]
D --> E["Iceberg tables\nS3 warehouse"]
B --> F["Safety Indexer\nPython/FastAPI"]
F --> G["Cassandra\npatient timeline"]
F --> H["Elasticsearch\nerror/step search"]
F --> I["MongoDB\ncase summary docs"]
G --> J["Java Query Service\nSpring Boot API"]
H --> J
I --> J
B --> K["Prometheus\n/metrics scraping"]
F --> K
Python: telemetry gateway + safety indexer + Spark jobsJava: Spring Boot query serviceAWS: Terraform starter for S3/ECR/CloudWatch + AWS-ready deployment modelKubernetes: deployable manifests for all microservicesAirflow: DAG included for Spark -> dbt -> data tests orchestrationdbt: staging/mart models included with Snowflake profile templateKafka: raw telemetry stream transportFlink/Spark: Spark Structured Streaming implemented + Flink windowed alert reference moduleSnowflake/Iceberg: Iceberg lakehouse table write path implemented (Snowflake is downstream warehouse target)Cassandra: timeline backend interface and deployment profileMongoDB: document summary backendElasticsearch: search backendTDD + tests: service-level unit/integration tests with CIIaC: Terraform + Kubernetes configMetrics/Monitoring: Prometheus endpoints and scrape configMicroservices: ingest, indexer, and query services with independent deploy units
services/
ingest-service/ # Python FastAPI telemetry gateway -> Kafka
indexer-service/ # Python FastAPI indexer -> Cassandra/Elastic/Mongo
query-service-java/ # Java Spring Boot read/query API
jobs/
spark/ # Spark Structured Streaming -> Iceberg
flink/ # Flink windowed alert reference flow
orchestration/
airflow/dags/ # Airflow DAG for end-to-end scheduling
analytics/
dbt/ # dbt project with staging + marts
infra/
terraform/ # AWS IaC starter (S3/ECR/CloudWatch)
k8s/ # Kubernetes manifests
monitoring/
prometheus.yml # Metrics scraping
scripts/
demo_load.py # Synthetic event generator
docker compose up --buildIn a second shell:
python scripts/demo_load.py --count 75
curl "http://localhost:8080/events/batch" -H "Content-Type: application/json" -d '{"events":[{"event_id":"evt-demo-100","patient_id":"pat-1","procedure_id":"proc-1","robot_arm":"arm_1","step":"suturing","force_newtons":18.2,"velocity_mm_s":11.0,"latency_ms":42,"timestamp":"2026-02-13T18:20:00Z","attributes":{"or_room":"OR-1"}}]}'
curl "http://localhost:8081/search?q=sutur"
curl "http://localhost:8081/patients/pat-1/timeline"
curl "http://localhost:8081/alerts/recent?min_level=high&limit=5"
curl "http://localhost:8081/patients/pat-1/risk-summary"
curl "http://localhost:8081/procedures/proc-1/risk-summary"
curl "http://localhost:8082/api/health"
curl "http://localhost:8082/api/patients/pat-1/summary"
curl "http://localhost:8082/api/alerts/top?limit=3"cd services/ingest-service && pytest -q
cd services/indexer-service && pytest -q
cd jobs && pytest -q
cd services/query-service-java && mvn test- The service defaults to in-memory stores for local reliability; backend toggles allow Cassandra/Elasticsearch/MongoDB profiles without changing API contracts.
- Spark job writes are configured in canonical Iceberg style and designed for checkpointed exactly-once-like processing semantics.
- Terraform intentionally starts compact, so infra can grow iteratively (MSK/EMR/EKS modules can be layered next without rework).
- Ingest service now supports deduplicated and batch workflows while preserving single-event low-latency paths.
