Search...

Site Reliability Engineer

Skills

About the Role

You will operate and support production infrastructure that powers large-scale blockchain networks. You will monitor systems, respond to incidents, follow and improve runbooks, and perform routine operational tasks such as restarts, upgrades, and configuration changes. You will help maintain and improve monitoring, logging, and alerting systems, participate in post-incident reviews, and continuously build knowledge of distributed systems and networking.

Requirements

  • Foundational understanding of Linux systems, processes, and basic networking concepts
  • Familiarity with at least one scripting or programming language such as Python, Bash, or Go
  • Interest in site reliability, monitoring, and operating production infrastructure
  • Clear written and verbal communication skills and willingness to learn
  • Ability to remain calm, methodical, and responsive during incidents or operational events
  • Exposure to cloud platforms such as AWS or GCP
  • Familiarity with containerization or orchestration technologies including Docker or Kubernetes
  • Basic understanding of blockchain or Web3 concepts such as nodes, RPC services, or validators
  • Experience with monitoring and observability tools such as Grafana, Prometheus, Datadog, or ELK-based stacks

Responsibilities

  • Monitor production systems, alerts, dashboards, and logs across networks including PoS and the Agglayer
  • Assist with incident detection, triage, escalation, and resolution under guidance
  • Support on-call and operational coverage through structured rotations
  • Follow, maintain, and improve runbooks and standard operating procedures
  • Perform routine operational tasks such as service restarts, upgrades, and configuration changes
  • Maintain and improve monitoring, logging, and alerting systems including dashboards for network health, RPC performance, and node metrics
  • Improve alert signal quality and reduce operational noise
  • Support cloud-based and containerized infrastructure, including nodes, RPC endpoints, and supporting services
  • Collaborate with protocol, product, and cross-functional teams to understand production issues and user impact
  • Participate in post-incident reviews and contribute to root-cause analysis documentation
  • Continuously build knowledge of blockchain fundamentals, distributed systems, and networking

Benefits

  • Remote first global workforce
  • Medical dental and vision health insurance
  • Company matching 401k with 3% match
  • $1,500 Home Office Set Up Allowance (life-time max)
  • $75 Monthly internet or phone reimbursement
  • Flexible Time Off
  • Company issued laptop
  • Egg freezing mental health and employee wellness benefits