Search...

Site Reliability Engineer

Skills

About the Role

You will design, implement, and maintain highly scalable and resilient cloud infrastructure on AWS using Infrastructure-as-Code. You will manage and optimize Kubernetes clusters, implement CI/CD pipelines and automation, and develop monitoring and observability solutions. You will apply cloud security and compliance best practices, troubleshoot incidents, perform root cause analysis, and optimize system performance. You will design disaster recovery and failover strategies and collaborate with engineering, architecture, and security teams to promote DevOps practices.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field
  • 5+ years of experience in cloud infrastructure, SRE, or DevOps roles
  • Strong expertise in AWS (EC2, S3, Lambda, RDS, VPC, IAM)
  • Hands-on experience with Kubernetes (EKS, K3s, or self-managed clusters)
  • Proficiency in scripting and automation using Python, Bash, or similar
  • Experience with Infrastructure as Code (Terraform, CloudFormation, or Ansible)
  • Familiarity with monitoring, logging, and observability tools (Prometheus, Grafana, Datadog, LGTM)
  • Strong understanding of networking concepts (VPC, load balancers, DNS, firewalls)
  • Experience with DevOps methodologies, CI/CD pipelines, and GitOps practices
  • Experience with high-performance and low-latency systems
  • Familiarity with serverless architectures and event-driven computing
  • Familiarity with Rust compilation processes and techniques
  • Exposure to cloud cost optimization and FinOps strategies
  • Previous exposure to Crypto, Traditional Finance, or Trading is desirable but not essential
  • AWS Certified SysOps Administrator - Associate (desired)

Responsibilities

  • Design, deploy, and maintain scalable and resilient infrastructure on AWS using Infrastructure-as-Code
  • Manage and optimize Kubernetes clusters for containerized applications
  • Implement and manage CI/CD pipelines for deployment, testing, and monitoring
  • Develop and maintain monitoring and observability solutions
  • Apply cloud security best practices and manage IAM and compliance controls
  • Troubleshoot incidents, perform root cause analysis, and optimize performance
  • Automate infrastructure provisioning and configuration management using IaC tools
  • Collaborate with software engineering, architecture, and security teams
  • Design disaster recovery and failover strategies to ensure business continuity

Benefits

  • Flexible working hours
  • Remote work
  • Autonomy in time management
  • Continuing professional development plan with learning and certification path