Hiring Playbook

How to hire an AI engineer without getting burned

The market is full of candidates who call themselves AI engineers. Fewer than half have shipped anything to production. Here is what to screen for, what to ignore, and where most hires go wrong.

Hiring an AI engineer in 2026 is harder than it was two years ago, not easier. The candidate pool has exploded. LinkedIn has 400,000 people with "AI engineer" in their title. Maybe 30,000 of them have actually shipped AI into production at any meaningful scale. The other 370,000 have taken a course, built a demo, or added a system prompt to an existing app. Telling the two apart is the entire job.

This page is a practical playbook written from the experience of closing AI engineer searches at Engineers in AI. Founder Tony Kochhar spent 20 years in engineering before moving into recruiting, and the questions below are the ones that actually predict production performance.

Define the role before you screen

The single most common reason an AI engineer hire fails is that the role was never pinned down. "We need an AI engineer" can mean any of the following: someone who fine-tunes models, someone who builds retrieval pipelines, someone who wires LLM APIs into a product, someone who runs evals, someone who owns inference infrastructure. A candidate strong in one of those variants can be a bad hire for another.

Before the first interview, write down the three most important things the hire will ship in their first 90 days. If you cannot, you are not ready to interview yet. When we start a search, the first call is spent pulling apart the role until those three things are concrete.

The questions that actually predict production performance

The screening questions below work because they force the candidate to talk about real systems in specific terms. Vague answers do not survive them.

  • Walk me through the last AI system you shipped to production. Real candidates talk about specific services, latencies, failure modes, and teammates. Demo candidates fall back to the framework names.
  • What broke first when that system hit real traffic? Every production AI system has broken in a specific way. If the answer is "nothing really broke," they have not actually shipped it.
  • How did you measure whether the system was working? Strong candidates talk about offline evals, online metrics, and the correlation between the two. Weak candidates talk about demos and vibe.
  • What was the cost per request, and how did you bring it down? This filters out candidates who have never had to care about inference economics.
  • Tell me about a wrong answer your system gave in production. How did you find out, and what did you change? Candidates who have owned a system have hallucination war stories. Candidates who have not, do not.

Signals that matter

The signals that predict production AI performance are not the ones most interviewers look for. They are operational, not academic.

  • Has the candidate been paged for an AI system? If yes, they own it. If no, they handed it off.
  • Can they describe the input and output distribution of their model, in words, without referring to slides?
  • Do they talk about evals before they talk about models? That is a good sign.
  • Have they rejected a model, prompt, or pattern in the last 6 months and moved to something better? A candidate whose stack has not changed in a year is a flag.

Red flags to catch early

  • Resume lists 15 tools but no shipped systems. Every AI engineer in the job market has heard of LangChain. Few have shipped with it.
  • Answers in frameworks rather than tradeoffs. "We used vector DB X" is not an answer. "We used vector DB X because the recall on our eval set was 12% higher than Y at the same latency budget" is.
  • Cannot name a specific failure mode of the model they worked on. This is the single highest-signal filter we use.
  • Describes all experiments as successful. Real production AI is a parade of failures and recoveries. A resume with no scars is a suspicious resume.

Where most AI engineer hires go wrong

Two common patterns. The first: hiring the most credentialed resume in the pile. A candidate from a name-brand AI lab can be extraordinary, but they can also be a research engineer who has never touched production code. The second: hiring the most enthusiastic generalist. Someone who will build you an impressive demo in week two and then discover in month three that they cannot debug a production inference latency spike.

The way to avoid both is to screen on shipped, owned, production systems. Not credentials. Not demos. Not hours on Twitter. The candidates who have actually done the work answer production questions in production terms.

When a recruiter is worth using

If you have the internal depth to screen candidates on the above, you do not strictly need a recruiter for AI hiring. If you do not, the cost of a bad AI engineer hire, at a fully loaded $400K+ per year, pays back a flat 20% placement fee many times over. That is where Engineers in AI fits: as an engineering-native screening layer that filters the 300K noise candidates out before you spend time on them.

We have closed over 1,000 technical placements across 20 years, including AI and ML hires for teams like Agoda, Hearst, Con Edison, and Trilogy. Flat 20% fee, no retainer, no exclusivity, and a 90-day replacement guarantee if the hire does not stick. If you are starting an AI engineer search and want an engineering-led read on the market, book a hiring call and we will spend 45 minutes on your role.

Hire an AI engineer who has actually shipped

Engineering-led screening, no credential theater. Flat 20% fee. 90-day replacement. No retainer.