Search...

Site Reliability Engineer — DevOps & Production Operations

23 hours agoSeniorSalary: 125K - 175KNew York, USAHybridDevopsJobs by TrueX

Skills

About the Role

You will operate, monitor, and improve production systems that power trading and settlement. You will own CI/CD pipelines, build infrastructure-as-code with Ansible and Terraform, and manage hybrid AWS and colocation environments. You will enhance observability with Prometheus and Grafana, secure containers and remediate CVEs, participate in on-call incident response and post-mortems, and collaborate with engineers deploying services in Go, C++ and TypeScript.

Requirements

  • 0-5 years of experience in DevOps, SRE, or infrastructure engineering
  • Strong Linux administration (RHEL/Rocky, Ubuntu or similar)
  • Experience with infrastructure-as-code tools such as Ansible and Terraform
  • Hands-on experience with containerization using Docker or Podman and orchestration
  • Familiarity with CI/CD systems, particularly GitHub Actions
  • Understanding of networking fundamentals including DNS, VPNs, firewalls and load balancers
  • Experience participating in and leading on-call rotations, incident response and post-mortems
  • Experience with metrics-based monitoring (Prometheus, Grafana, Alertmanager) (preferred)
  • Experience with AWS, colocation environments, or high-frequency trading infrastructure (preferred)
  • Familiarity with Go, C++ or TypeScript codebases (preferred)

Responsibilities

  • Own and maintain CI/CD pipelines
  • Build and maintain infrastructure automation using Ansible and Terraform
  • Manage hybrid infrastructure spanning AWS and colocation environments
  • Develop and improve observability, monitoring, and alerting using Prometheus and Grafana
  • Ensure container security, patching, and CVE remediation across Linux systems
  • Support high-performance trading infrastructure with focus on latency and reliability
  • Collaborate with engineering teams to deploy and scale services built in Go, C++ and TypeScript
  • Maintain container registries, deployment tooling, and release processes
  • Participate in on-call rotation to respond to production incidents
  • Contribute to security compliance initiatives including SOC2

Benefits

  • Equity
  • Full healthcare
  • Flexible remote work