Search...

Senior AI Platform Engineer

Skills

About the Role

You will design, deploy, and operate internal AI platforms across Kubernetes (EKS), AWS Serverless, and local developer environments. You will implement observability and FinOps for LLM usage, build human-in-the-loop automation to reduce operational toil, create custom tooling and API integrations, design cost and usage dashboards, ensure reliability and security, and produce documentation and golden paths to enable product teams.

Requirements

  • 7+ years of hands-on technical experience with large-scale production environments and infrastructure
  • In-depth knowledge of AWS architecture including Serverless, Lambda, and EKS and ability to manage diverse environments including local developer setups
  • Strong grasp of Kubernetes, microservice architecture, and CI/CD principles (GitHub Actions)
  • Practical experience setting up infrastructure to run, monitor, and scale AI-driven applications or internal developer tooling
  • Proven ability to learn and master new technologies quickly
  • Solid understanding of performance monitoring tools and troubleshooting complex production environments
  • Proactive approach to upskill teams and integrate cutting-edge solutions

Responsibilities

  • Architect, deploy, and manage internal automation platforms and AI orchestration tools across Kubernetes (EKS), AWS Serverless, and local deployment configurations
  • Implement scalable logging and monitoring for AI model usage to provide visibility into LLM expenditures and token budgets
  • Build human-in-the-loop processes to streamline operational workflows including infrastructure patching and maintenance using AI and automation
  • Leverage interface protocols such as MCP to build custom internal tools and API integrations
  • Design and maintain dashboards that track operational costs and provide data-driven insights to leadership
  • Execute service capacity planning and system tuning for internal AI tools to ensure high availability
  • Ensure internal AI tools adhere to security standards and maintain a minimal vulnerability window in partnership with SecOps
  • Create golden paths and technical documentation to democratize access to AI tooling and upskill product engineering teams

Benefits

  • Flexible fully remote work
  • On-call rotations designed to maintain a healthy work-life balance