For engineers

Inference Engineer jobs.

Serves large models fast and cheap, from kernels to clusters. Here is what the role is, what it pays, and what is open now.

See open roles Hiring? →

01 / Definition

What is a inference engineer

An inference engineer makes models serve fast and cheap at scale. They work across the whole stack, from CUDA kernels up to cluster scheduling, and know which lever matters for the workload in front of them.

At scale every millisecond and every cent shows up in the bill. The job is cutting latency and cost without giving up quality, and knowing which of those the product actually needs.

What they do

Profile and cut latency across the serving stack, kernel to cluster.
Bring cost per token down without giving up quality.
Choose the right batching, caching, and quantization per workload.
Keep the system fast when traffic is not polite.

02 / Pay

What it pays

In the US, total compensation for strong inference engineers usually lands around $220k to $340k, higher at frontier labs and with equity at startups. We do not post a role we would not take ourselves.

03 / Open now

Open inference engineer roles

INF

Inference EngineerFrontier lab partner

$250k to $340k San Francisco, CA

FAQ

Common questions

What does an inference engineer do?

They serve large models fast and cheaply, optimizing everything from GPU kernels to batching and cluster scheduling.

What skills does an inference engineer need?

Comfort from CUDA up to Kubernetes, and the judgment to tell which optimizations are worth the complexity.

What should an inference engineer be paid?

In the US, total compensation typically runs $220k to $340k, higher for kernel-level work at labs.

We only hire the best AI-native engineers.

See all roles Introduce yourself