Search...

Senior MLOps Engineer

Skills

About the Role

You will deploy and maintain production ML infrastructure, optimize GPU utilization, and serve large and small language models. You will build CI/CD pipelines, create Helm templates for Kubernetes deployments, implement model optimization and serving workflows, and set up monitoring, logging, and automated workflows to ensure reliable model delivery.

Requirements

  • Bachelor's or Master's degree in Computer Science Engineering or related field
  • Proficiency in Kubernetes Helm and containerization technologies
  • Experience with GPU optimization including MIG and NOS
  • Experience with cloud platforms such as AWS GCP and Azure
  • Knowledge of monitoring tools such as Grafana and Prometheus
  • Proficiency in scripting languages Python and Bash
  • Hands-on experience with CI/CD tools and workflow management systems
  • Familiarity with Triton Inference Server ONNX and TensorRT

Responsibilities

  • Deploy scalable production-ready ML services with optimized infrastructure
  • Manage and autoscale Kubernetes clusters
  • Optimize GPU resources using MIG and NOS
  • Manage cloud storage to ensure high availability and performance
  • Integrate LoRA and model merging workflows
  • Adapt and deploy state-of-the-art ML codebases
  • Deploy and manage LLMs SLMs and LMMs
  • Serve models using Triton Inference Server and other serving frameworks
  • Leverage vLLM and TGI for model serving
  • Optimize models with ONNX and TensorRT
  • Develop Retrieval-Augmented Generation systems
  • Set up monitoring and logging with Grafana Prometheus Loki Elasticsearch and OpenSearch
  • Write and maintain CI/CD pipelines using GitHub Actions
  • Create Helm templates for rapid Kubernetes node deployment
  • Automate workflows using cron jobs and Airflow DAGs