For engineers
Serves large models fast and cheap, from kernels to clusters. Here is what the role is, what it pays, and what is open now.
An inference engineer makes models serve fast and cheap at scale. They work across the whole stack, from CUDA kernels up to cluster scheduling, and know which lever matters for the workload in front of them.
At scale every millisecond and every cent shows up in the bill. The job is cutting latency and cost without giving up quality, and knowing which of those the product actually needs.
In the US, total compensation for strong inference engineers usually lands around $220k to $340k, higher at frontier labs and with equity at startups. We do not post a role we would not take ourselves.
They serve large models fast and cheaply, optimizing everything from GPU kernels to batching and cluster scheduling.
Comfort from CUDA up to Kubernetes, and the judgment to tell which optimizations are worth the complexity.
In the US, total compensation typically runs $220k to $340k, higher for kernel-level work at labs.