V
vinicpires's photo
Vinicius Carvalho Pires
$37/hr or $75,000/yr

Active 2 days ago


Member since Mar 2026

Share this profile:

Site Reliability Engineer

Site Reliability Engineer
Available for hire
Years of experience
4+ years
Experience level
Senior
Available for
Full-time
Download Resume / CV

Site Reliability Engineer with 4+ years of experience managing high-scale multi-cluster Kubernetes environments (100+ nodes) for 1M+ concurrent users. Proven track record in cloud cost optimization, generating $140k+ in annual savings through FinOps and infrastructure right-sizing. Expert in GitOps, CI/CD automation, and enterprise-grade observability (OpenTelemetry, Datadog) to drive high availability and drastically reduce MTTR in distributed systems.

Employment History

Site Reliability Engineer at Kaizen Gaming Current 2025 - Now
- Engineered high-performance GitLab CI/CD pipelines, slashing lead time for changes by 60% and implementing automated canary deployments for zero-downtime releases. - Orchestrated multi-region OpenShift clusters (100+ nodes) across on premise and cloud (Azure and On-premise) environments, supporting a high-traffic gaming platform with 1M+ concurrent users and maintaining 99.9% availability for high-memory workloads. - Migrated Helm-based deployments to ArgoCD (GitOps), establishing a centralized and auditable multi-cluster deployment model. - Contributed to cloud cost optimization initiatives, right-sizing overprovisioned instances and reducing monthly infrastructure spend from ~€32k to an estimated ~€20k (~€144k annual savings) for a single Kubernetes cluster. - Designed and operated full-stack observability platforms (OpenTelemetry, Prometheus, Grafana, VictoriaMetrics, OpenSearch), reducing troubleshooting time and improving operational visibility across distributed systems. - Developed a Python automation to map GitLab source code to live Helm/ArgoCD deployments on OpenShift, eliminating 'orphan' applications and reducing cross-team troubleshooting time by 40%.
Observability Engineer at Appoena (allocated at MARS Inc.) 2024 - 2025
- Contributed to an enterprise-wide observability consolidation initiative, migrating multiple legacy monitoring stacks into a unified Datadog platform and establishing a single source of truth across services and infrastructure. - Optimized Datadog usage (log pipelines, retention strategy, and metric cardinality), reducing annual platform costs by approximately $20,000 without compromising visibility. - Identified and eliminated unused Azure ExpressRoute circuits, preventing over $100k in annual unnecessary cloud expenses. - Redesigned alerting and observability workflows, reducing troubleshooting time (MTTR) from ~20 minutes to under 2 minutes by improving signal quality and actionable monitoring standards.
DevOps Engineer at Jack Experts 2023 - 2024
- Drove Kubernetes (EKS) cost optimization efforts across multi-cloud environments (AWS, Azure, GCP), contributing to a ~20% reduction in infrastructure costs through workload tuning and resource right-sizing. - Designed an automated incident management workflow by integrating Zabbix, Rundeck, and ITSM tool, reducing manual intervention for recurring production alerts by 40% and significantly accelerating MTTR. - Designed and implemented standardized CI/CD pipelines (GitLab CI, GitHub Actions), improving deployment reliability and release safety. - Applied FinOps practices to improve cost visibility and eliminate inefficient resource allocation, increasing cloud spending predictability. - Implemented Infrastructure as Code (Terraform, Ansible) and operated production-grade Kubernetes platforms with full-stack observability (Prometheus, Grafana, Zabbix, Loki and Promtail).
Support Engineer at Acelera Litoral 2022 - 2023
Provided Tier 2 technical support for multiple clients, stabilizing network infrastructure and reducing ticket resolution time by improving monitoring alerts and documentation.

Education

Bachelor of Computer Science at Universidade Sao Judas Tadeu 2022 - 2025