Site Reliability Engineer

2 months agoDevops Jobs by Manifold Labs (Targon)

Skills

Virtualization Service Mesh Linear Systems Engineering Loki Discord Metric Ci Logging Alerting Notion Infrastructure Observability Grafana Prometheus Confidential Virtual Machine Go Cd Kubernetes Github Benchmarking

About the Role

You will ensure services stay online and performant around the clock. You will optimize Kubernetes clusters including service mesh, metrics, and logging. You will benchmark services and identify infrastructure bottlenecks. You will improve observability and alerting to catch issues before they impact users, scale services to minimize downtime under load, and develop CI/CD pipelines for new and existing services.

Requirements

Hands-on experience with Kubernetes in production environments
Proficiency with Golang for systems and infrastructure tooling
Familiarity with confidential virtual machines (CVMs)
Experience with Prometheus, Loki, and Grafana for monitoring and observability

Responsibilities

Ensure services stay online and performant, including during off hours
Optimize Kubernetes clusters, including service mesh, metrics, and logging
Benchmark services and identify infrastructure bottlenecks
Improve observability and alerting systems to catch issues before they impact users
Scale services to minimize downtime under load
Develop CI/CD pipelines for new and existing services

Skills

About the Role

Requirements

Responsibilities

Similar Jobs