Search...

Staff MLOps Engineer, LLMOps

Skills

About the Role

You will build and operate production-grade infrastructure for large language models and agentic workflows. You will create CI/CD pipelines for training, evaluation, and deployment, automate model versioning and approval workflows, and deploy model serving and monitoring systems. You will integrate vector databases, feature stores, and model registries, instrument observability and cost monitoring, and enable reproducible sandboxes and human-in-the-loop evaluation for researchers. You will continuously evaluate and integrate state-of-the-art LLM tooling and ensure reliability, compliance, and performance of AI systems.

Requirements

  • Proficient software engineering skills, primarily in Python
  • Experience with containerization and orchestration (Docker, Kubernetes)
  • Experience with infrastructure-as-code and CI/CD (Terraform, GitHub Actions or similar)
  • Experience with monitoring and logging frameworks (Datadog, Prometheus, OpenTelemetry)
  • Knowledge of MLOps best practices including model versioning, rollback, and automated evaluation
  • Experience with scalable model serving (vLLM, Triton, BentoML or similar)
  • Experience integrating vector databases, feature stores, and model registries
  • Experience with experiment tracking and evaluation frameworks (MLflow or similar)
  • Ability to optimize prompt and response flows and monitor cost, latency, and performance

Responsibilities

  • Build reusable CI/CD workflows for model training, evaluation, and deployment
  • Automate model versioning, approval workflows, and compliance checks
  • Design and implement a modular, scalable AI infrastructure stack including vector databases and feature stores
  • Integrate and maintain model registries and experiment tracking
  • Partner with engineering and data science to embed models and agents into applications
  • Evaluate and integrate state-of-the-art LLM tools and frameworks
  • Drive AI reliability and governance, including monitoring, testing, and drift detection
  • Deploy infrastructure for offline and online evaluation including regression testing and human-in-the-loop workflows
  • Provide sandboxes, dashboards, and reproducible environments to enable rapid iteration

Benefits

  • Equity plan eligibility