Careem
Dubai, UAEStaff Site Reliability Engineer · Tech Lead
Jan 2023 — Present
Technical Lead and SRE Architect mentoring a team focused on infrastructure provisioning and developer experience.
- Designed and built a self-service AI Agent Platform with MCP tool-calling, multi-agent orchestration, and HITL approval flows for internal infrastructure-support workflows — ~200K req/day, ~800 registered tools, ~60 agents in production. Engineering teams contribute their own tools and agents.
- Slashed EKS cluster costs by 40% via Karpenter consolidation, migration to ARM and Spot nodes, VPA, and custom scheduling.
- Consolidated 12 legacy API gateways onto a single Kong gateway at ~10k RPS with 99.99% uptime during cutover; built a GitOps-driven control plane for API route management.
- Designed and deployed a fully automated cloud-resource provisioning platform with GitOps (Terraform + Terragrunt) — provisioning time for new services from days to minutes.