Inglenook Work with us

Roles / INF

Inference Engineer

Frontier lab partner

Every millisecond and every cent matters at this scale. The work is making large models serve fast and cheap, and knowing which of those two the workload in front of you actually needs.

What you will do

  • Profile and cut latency across the serving stack, from kernel to cluster.
  • Bring cost per token down without giving up quality.
  • Choose the right batching, caching, and quantization for the workload.
  • Keep the system fast when traffic is not polite.

What we look for

  • You have shipped inference at scale, not just benchmarked it.
  • You are comfortable from CUDA up to Kubernetes.
  • You can tell which optimizations are worth the complexity and which are not.

What stays open

The shape of the engagement is a conversation. Apply and we will figure out what fits.

More about the inference engineer track →