Network Operations Engineer

10 hours agoLatin America Remote Full Time Devops Jobs by Polygon

Skills

About the Role

You will serve as the front line of reliability for production infrastructure. You will detect and respond to incidents, triage alerts, coordinate incident response, and document decisions and outcomes in real time. You will also improve observability, refine alerting, build dashboards, and create runbooks to scale operational coverage. This is a shift-based role where you will validate system and user-facing functionality and support ecosystem participants during incidents.

Requirements

Foundational experience with Linux systems, including filesystem navigation, log reading, and process awareness
Understanding of core networking concepts such as DNS, HTTP, and TCP/IP and ability to troubleshoot connectivity issues
Basic scripting ability (Python, Bash) to automate tasks and analyze system data
Exposure to monitoring and observability tools such as Datadog, Grafana, or Prometheus
Strong written communication skills for clear incident documentation and procedures under pressure
Willingness to work shift-based, follow-the-sun schedules with a structured troubleshooting approach
Familiarity with blockchain infrastructure, including node operation or EVM-based systems (preferred)
Experience with Datadog or similar observability platforms in production (preferred)
Exposure to infrastructure-as-code tools such as Terraform or configuration management tools like Ansible (preferred)
Previous experience in a network operations center, incident response team, or on-call rotation (preferred)
Experience working in a remote, globally distributed team (preferred)

Responsibilities

Monitor the health and performance of blockchain networks, bridges, RPC services, staking systems, and user-facing products
Track third-party dependencies and identify degradation that may impact the ecosystem
Validate and triage alerts by distinguishing signal from noise, assessing severity, and determining impact
Escalate confirmed issues to the appropriate SRE or engineering teams with clear structured context
Coordinate incident response by engaging stakeholders, maintaining timelines, and ensuring consistent communication
Document incidents in real time, including decisions, actions, and outcomes
Build and improve dashboards, alerting systems, and monitoring coverage to enhance visibility
Create and maintain runbooks for common failure modes and triage workflows
Support validators and infrastructure providers when issues intersect with systems
Validate user-facing product functionality during incidents

Benefits

Remote first global workforce
Medical insurance
Dental insurance
Vision insurance
Company matching 401k with 3% match (United States employees only)
$1,500 Home Office Set Up Allowance (lifetime max)
$200 Annual AI Allowance
$75 Monthly internet or phone reimbursement
Flexible Time Off
Company issued laptop
Egg freezing benefits
Mental health and employee wellness benefits

Skills

About the Role

Requirements

Responsibilities

Benefits

Similar Jobs