End-to-end FastAPI application for managing cardiac CT accessions, preprocessing uploads, and serving neural-net segmentations through NVIDIA Triton. The code base demonstrates how to combine Python microservices, GPU inference, and AWS-grade infrastructure patterns in a single repo.
- Clinical data ingestion – Users upload raw DICOM slices through the
internal/frontend_servicePyQt client, the FastAPI backend persists metadata in Postgres and writes the binaries to object storage (AWS S3 in production, configurable locally). - Asynchronous ML pipeline –
celery+ Redis queue drive heavy preprocessing and inference work so API latency never spikes. Tasks normalize/resize medical volumes with OpenCV/pydicom and then call Triton viatritonclient. - GPU-first serving –
deploy/compose/docker-compose.yamlstands up Triton with a TorchScript MONAI UNet (internal/inference/triton_models/img_seg_model) so the same model artifact can run locally or on managed GPU fleets (e.g., Amazon ECS/EKS with GPU nodes). - Modern backend tooling – FastAPI + SQLAlchemy + Alembic power the REST surface and migrations (
internal/api_service). Pydantic models, dependency-injected sessions, and typed routers keep the codebase maintainable. - Infrastructure knowledge – Docker multi-service stack (Postgres, Redis, Triton, FastAPI, Celery) mirrors a realistic AWS deployment pipeline, emphasizing containerization, environment-based configuration, and secret management.
These touchpoints map directly to the skills AWS recruiters screen for in SWE/ML engineer roles: distributed systems, MLOps, inference serving, and cloud-native design.
PyQt Client ──▶ FastAPI (internal/api_service)
│ └── Uploads metadata to Postgres
│ └── Streams raw DICOMs to S3
▼
Redis + Celery (task_queue)
│ └── Downloads S3 objects
│ └── Applies MONAI-style preprocessing
│ └── Calls Triton via gRPC
▼
Triton Server (deploy/compose service)
│ └── Hosts TorchScript UNet
│ └── Returns masks
▼
S3 + Postgres (results, metadata)
| Layer | Highlights |
|---|---|
FastAPI service (internal/api_service) |
Auth, user sessions, /user/new_accession uploads, S3 presigned URLs, SQLAlchemy models/migrations, dependency-injected services. |
Task queue (internal/api_service/task_queue) |
Redis-backed Celery app, preprocessing pipeline using pydicom, opencv-python-headless, NumPy. Calls Triton via gRPC to obtain segmentation masks. |
Inference assets (internal/inference) |
MONAI training notebook (segmentation.ipynb), TorchScript export (model.pt), Triton model repo (triton_models/img_seg_model). |
Container stack (deploy/compose/docker-compose.yaml) |
Postgres 15, Redis 7, Triton 24.04, FastAPI, and the Celery worker. Exposes ports for API (8010) and Triton (8000/8001/8002). |
Desktop viewer (internal/frontend_service) |
PyQt GUI for selecting sessions, launching viewers, and performing uploads/downloads. |
-
Clone & configure
git clone https://github.com/atharva789/xray_analysis.git cd xray_analysis cp deploy/.env.backend.local.example deploy/.env.backend.local # set DB, S3, auth secrets
Provide AWS credentials (or MinIO equivalents) in your shell so the API and Celery worker can talk to object storage.
-
Build and run the stack
cd deploy/compose docker compose up --buildThis launches Postgres, Redis, Triton, FastAPI, and the Celery worker. Triton automatically loads the TorchScript model from
internal/inference/triton_models. -
Interact
- Use the PyQt frontend (
internal/frontend_service) or call the FastAPI endpoints (/user/new_accession,/user/sessions) to upload CT series. - Celery downloads the raw files from S3, clips/intensity-normalizes, resizes each slice to
96×96, uploads the processed tensor, and finally invokes Triton for segmentation. The resulting masks are written back next to the accession prefix. - Retrieve presigned URLs via
/user/get-session/{id}to download slices, processed data, or masks.
- Use the PyQt frontend (
- Training notebook –
internal/inference/segmentation.ipynbdocuments the MONAI UNet setup (2Dspatial_dims, 1-channel input). The notebook also exports the TorchScript artifact and matching Triton config. - Task queue –
internal/api_service/task_queue/tasks.pyis the best place to extend preprocessing (e.g., additional augmentations, stats) or change Triton model names via environment variables (TRITON_URL,TRITON_MODEL_NAME, etc.). - Database models – Update
internal/api_service/db_service/modelsand run Alembic migrations (alembic upgrade head) whenever schema changes are required. - Secrets –
.gitignoreexcludes training data, model repos, and.env*files. Use AWS Secrets Manager or SSM Parameter Store in production.
- Cloud-native ML delivery: storing datasets/masks in S3, orchestrating preprocessing through Celery, and serving inference through Triton positions the project squarely in AWS MLOps territory.
- Infrastructure as Code: Docker Compose defines all runtime dependencies, mirroring a multi-container deployment that could be lifted into ECS/EKS easily.
- Systems design: decoupled frontend, API, task queue, and inference services highlight asynchronous design patterns, fault tolerance (Celery retries), and horizontal scalability.
- Medical imaging domain: handling DICOM metadata, spacing/orientation transforms, and MONAI data loaders show familiarity with healthcare data compliance and preprocessing nuances.
- Automate model retraining/export pipelines.
- Add observability (Prometheus/Grafana) for queue depth and inference latency.
- Integrate IAM roles / STS tokens for S3 access when deploying to AWS.