Job for Web3 Beginners
Site Reliability Engineer
Skills
About the Role
You ensure the reliability and performance of the platform by operating improving and scaling its infrastructure. You own tooling, processes, and operational practices that keep the platform running smoothly. You participate in on-call rotations during Asian business hours and respond to incidents with swift investigation and resolution. You design observability dashboards and automation to reduce toil and improve reliability across production systems.
Requirements
- Strong experience with Linux and cloud infrastructure
- Experience operating and supporting production systems
- Experience with Docker and containerized environments
- Experience with observability and incident-management tools such as Grafana, Prometheus, PagerDuty, or similar
- Ability to automate workflows using Rust, Python, Bash, or similar languages
- Strong troubleshooting and debugging skills
- A high degree of ownership and the ability to make sound decisions independently
- Experience with distributed systems
- Experience operating high-availability, low-latency services
- Experience with CI/CD systems and deployment automation
- Experience designing secure operational workflows and access controls
- No prior blockchain or cryptocurrency experience is required
Responsibilities
- Monitor the health and performance of the platform
- Respond to production incidents and drive them through to resolution
- Investigate failures, identify root causes, and coordinate fixes
- Ensure issues are detected, understood, and addressed quickly
- Identify recurring operational pain points and eliminate them
- Improve software deployment processes and operational workflows
- Participate in incident reviews and drive preventative improvements
- Contribute reliability focused changes directly to production systems
- Design and maintain dashboards, metrics, alerting, and monitoring systems
- Improve signal quality while reducing alert fatigue
- Build automation and internal tools that make the platform easier to operate
- Help establish reliability best practices across the engineering organization
Benefits
- Token and equity allocation
- Ownership and responsibility from day one
- Work from anywhere within the target timezone range
- Occasional travel to Europe and elsewhere for meetups
