Urgently Hiring
Senior Infrastructure Engineer
Skills
About the Role
You will design and build internal platform capabilities that let product teams ship software reliably and quickly. You will own core platform components such as the TypeScript Pulumi codebase, Kubernetes runtime environments, and the monitoring stack. You will architect scalable, fault-tolerant systems, enforce engineering standards through code, and build developer tooling and self-service primitives that reduce cognitive load for application teams. You will also iterate on observability, alerts, and operational runbooks, and participate in maintenance and on-call rotations as needed.
Requirements
- 5+ years of professional Software Engineering experience including infrastructure and cloud domains
- Strong programming skills in TypeScript or another strictly typed language
- Strong understanding of distributed systems fundamentals including availability, consistency, observability, and fault tolerance
- Experience designing and maintaining infrastructure as code systems
- Deep understanding of AWS architecture
- Experience designing, operating, and scaling Kubernetes workloads
- Proven ability to build reusable systems and abstractions rather than one-off scripts
- Experience defining SLIs/SLOs, error budgets, and incident tooling (preferred)
- Experience improving observability practices, alerts, and operational dashboards (preferred)
- Familiarity with GitOps patterns and deployment automation technologies (preferred)
- Expertise in developer experience or developer productivity and building internal development platforms (preferred)
- Experience in maintenance and production operations and on-call rotations (preferred)
Responsibilities
- Design and build internal platform components and developer tooling
- Build reusable infrastructure abstractions and internal services
- Own and maintain the TypeScript Pulumi codebase
- Design, operate, and scale Kubernetes-based runtime environments
- Architect scalable, fault-tolerant distributed systems
- Implement and maintain monitoring and observability stacks
- Enforce engineering standards through code and automation
- Continuously improve developer experience and developer productivity
- Participate in production maintenance and on-call rotations
