Site Reliability Engineer — DevOps & Production Operations
Skills
About the Role
You will operate, monitor, and improve production systems that power trading and settlement. You will own CI/CD pipelines, build infrastructure-as-code with Ansible and Terraform, and manage hybrid AWS and colocation environments. You will enhance observability with Prometheus and Grafana, secure containers and remediate CVEs, participate in on-call incident response and post-mortems, and collaborate with engineers deploying services in Go, C++ and TypeScript.
Requirements
- 0-5 years of experience in DevOps, SRE, or infrastructure engineering
- Strong Linux administration (RHEL/Rocky, Ubuntu or similar)
- Experience with infrastructure-as-code tools such as Ansible and Terraform
- Hands-on experience with containerization using Docker or Podman and orchestration
- Familiarity with CI/CD systems, particularly GitHub Actions
- Understanding of networking fundamentals including DNS, VPNs, firewalls and load balancers
- Experience participating in and leading on-call rotations, incident response and post-mortems
- Experience with metrics-based monitoring (Prometheus, Grafana, Alertmanager) (preferred)
- Experience with AWS, colocation environments, or high-frequency trading infrastructure (preferred)
- Familiarity with Go, C++ or TypeScript codebases (preferred)
Responsibilities
- Own and maintain CI/CD pipelines
- Build and maintain infrastructure automation using Ansible and Terraform
- Manage hybrid infrastructure spanning AWS and colocation environments
- Develop and improve observability, monitoring, and alerting using Prometheus and Grafana
- Ensure container security, patching, and CVE remediation across Linux systems
- Support high-performance trading infrastructure with focus on latency and reliability
- Collaborate with engineering teams to deploy and scale services built in Go, C++ and TypeScript
- Maintain container registries, deployment tooling, and release processes
- Participate in on-call rotation to respond to production incidents
- Contribute to security compliance initiatives including SOC2
Benefits
- Equity
- Full healthcare
- Flexible remote work
