Urgently Hiring

Senior DevOps / SRE Engineer

2 months agoSenior Full Time Devops Jobs by Senpi

Skills

About the Role

You will build and maintain the infrastructure that runs concurrent AI trading agents, including cron schedules, state files, and trailing stop processes. You will deploy and manage agent environments and workspace persistence, design and operate CI/CD pipelines, and execute zero-downtime deployment strategies. You will build monitoring, alerting, and observability across metrics, logs, and traces; operate and scale Kubernetes/EKS clusters and containerized workloads; manage Redis, Postgres/RDS, ClickHouse, Kafka, and blockchain node infrastructure; and own logging, security, incident response, backups, and disaster readiness. You will lead on-call practices, run incident response and postmortems, and implement long-term reliability improvements for production trading workloads.

Requirements

Professional DevOps, SRE, or infrastructure engineering experience
Strong Kubernetes experience, ideally on AWS EKS
Hands-on experience with Docker and Helm
Proficiency with infrastructure as code such as Terraform or Ansible
Experience with CI/CD and deployment automation (GitHub Actions, ArgoCD, or similar)
Strong AWS infrastructure experience; multi-cloud is a plus
Experience operating Redis, Postgres/RDS, ClickHouse, and Kafka in production
Observability experience with Prometheus, Grafana, Datadog, Loki, ELK/OpenSearch/Kibana, or OpenTelemetry
Ability to build dashboards, alerts, and operational visibility
Ability to debug across Python, Node.js, and Go
Experience with access management, secrets handling, production hardening, and operational controls
Experience with incident management, on-call operations, and backup/recovery planning
Understanding of real-time systems and low-latency reliability requirements for trading
Familiarity with blockchain node infrastructure, exchange APIs, wallet operations, and on-chain monitoring
Experience with or willingness to learn MCP server deployment and auth management
Hyperliquid experience is a plus
OpenClaw and multi-agent orchestration experience is strongly preferred

Responsibilities

Build and maintain infrastructure for concurrent AI trading agents
Deploy and manage OpenClaw agent environments with workspace persistence and cron orchestration
Design and operate CI/CD pipelines for production agent updates
Define and execute zero-downtime deployment and safe rollback strategies
Ensure active positions remain protected through infrastructure changes
Build monitoring, alerting, and observability across metrics, logs, traces, and dashboards
Manage cloud infrastructure using infrastructure as code
Operate and scale Kubernetes/EKS clusters and containerized workloads
Operate and maintain Redis, Postgres/RDS, ClickHouse, and Kafka
Operate blockchain node infrastructure and ensure reliable exchange API and wallet connectivity
Own logging, security, and incident response across the full stack
Lead incident response, on-call practices, debugging, mitigation, and postmortems
Own backup, recovery, and disaster-readiness for critical infrastructure

Skills

About the Role

Requirements

Responsibilities

Similar Jobs