Senior MLOps Engineer
Skills
OpensearchOnnxLokiMigJob SchedulerTensorrtNosBashAirflowGpuRagLoraDistributed TrainingFine-TuningS3Github ActionsContainerizationElasticsearchMonitoringAwsCi/CdAzureGcpMlopsGrafanaPrometheusLlmVllmTritonHelmPythonGoRustKubernetesModel OrchestrationModel MergingSlmLmmTgiCronModel ServingGpu OptimizationModel Fine-Tuning
About the Role
You will deploy and maintain production ML infrastructure, optimize GPU utilization, and serve large and small language models. You will build CI/CD pipelines, create Helm templates for Kubernetes deployments, implement model optimization and serving workflows, and set up monitoring, logging, and automated workflows to ensure reliable model delivery.
Requirements
- Bachelor's or Master's degree in Computer Science Engineering or related field
- Proficiency in Kubernetes Helm and containerization technologies
- Experience with GPU optimization including MIG and NOS
- Experience with cloud platforms such as AWS GCP and Azure
- Knowledge of monitoring tools such as Grafana and Prometheus
- Proficiency in scripting languages Python and Bash
- Hands-on experience with CI/CD tools and workflow management systems
- Familiarity with Triton Inference Server ONNX and TensorRT
Responsibilities
- Deploy scalable production-ready ML services with optimized infrastructure
- Manage and autoscale Kubernetes clusters
- Optimize GPU resources using MIG and NOS
- Manage cloud storage to ensure high availability and performance
- Integrate LoRA and model merging workflows
- Adapt and deploy state-of-the-art ML codebases
- Deploy and manage LLMs SLMs and LMMs
- Serve models using Triton Inference Server and other serving frameworks
- Leverage vLLM and TGI for model serving
- Optimize models with ONNX and TensorRT
- Develop Retrieval-Augmented Generation systems
- Set up monitoring and logging with Grafana Prometheus Loki Elasticsearch and OpenSearch
- Write and maintain CI/CD pipelines using GitHub Actions
- Create Helm templates for rapid Kubernetes node deployment
- Automate workflows using cron jobs and Airflow DAGs
