AI Inference Engineer

1 month agoAi Jobs by Manifold Labs (Targon)

Skills

About the Role

You will optimize the latency and throughput of model inference, design and build reliable production serving systems, and accelerate research on scaling test-time compute. You will implement batching, caching, load balancing, and model parallelism; develop low-level GPU kernels and code generation; apply algorithmic optimizations such as quantization, distillation, and speculative decoding; and test, benchmark, and improve inference reliability for large-scale, high-concurrency deployments.

Requirements

Experience with system optimizations for model serving, including batching, caching, load balancing, and model parallelism
Experience with low-level inference optimizations such as GPU kernels and code generation
Experience with algorithmic inference optimizations such as quantization, distillation, and speculative decoding
Experience with large-scale, high-concurrency production serving
Experience with testing, benchmarking, and reliability of inference services

Responsibilities

Optimize model inference latency and throughput
Build reliable production serving systems
Accelerate research on scaling test-time compute
Implement batching, caching, and load balancing for model serving
Develop model parallelism and low-level GPU kernel optimizations
Implement code generation for inference
Apply algorithmic optimizations such as quantization, distillation, and speculative decoding
Test, benchmark, and improve inference service reliability for high-concurrency deployments