A generative AI recruiter writing a job description in January is often describing a role that no longer exists in April. The context window doubles. A new open-weights model ships. The retrieval pattern everyone used last quarter becomes the thing senior engineers now mock on Twitter. If the screening bar does not keep up, the candidates you hire end up out of date by their start date.

Engineers in AI is an LLM engineer recruiter that treats the moving target as the job. We update our screening framework at the start of every quarter, in conversation with the LLM engineers we have placed and the hiring managers we are closest to. Whatever we were asking in the technical screen in January looks different by April.

What LLM engineering actually means right now

The LLM engineer role today is a braid of four disciplines: retrieval, prompting and structured output, evals, and infrastructure. The candidates we place need to be credible in at least three of those four. Pure prompt engineers who cannot reason about latency budgets or retrieval quality are a weak fit for production teams. Infrastructure engineers who have never debugged a prompt are rarely the hire either.

Retrieval: embedding choice, chunking strategy, reranking, query rewriting, handling long documents
Prompting and structured output: JSON mode, function calling, grammar-constrained decoding, handling refusals
Evals: offline eval sets, live A/B on quality, regression detection, human-in-the-loop feedback
Infrastructure: inference serving, caching, streaming, cost per request, provider fallback

The current LLM screen

Tony Kochhar runs the first technical conversation on every senior LLM search. He will pull on any claim in the resume and ask for specifics.

Walk me through the retrieval stack you shipped. What did you measure, and what did you change as a result?
How did you evaluate your system end to end? What was the offline-to-online correlation?
When did prompt engineering stop being enough, and what did you move to?
Tell me about a hallucination incident in production. How did you detect it, and what did you change?
What is the latency budget of your critical path, and where is your P95 going now?

Candidates who have actually shipped generative AI systems answer those questions in concrete numbers. Candidates who have only prototyped answer in frameworks and vendor names. We do not forward the second group.

Why quarterly screen updates matter

Most recruiters write a screening script once and reuse it for two years. In ordinary software that works fine. In LLM engineering it guarantees you end up hiring last year's consultant. The questions that were sharp in early 2024 now filter in candidates who have done nothing meaningful since. The questions that look sharp in Q2 of 2026 will look dated by Q4.

Our job is to keep the bar current. We talk to our placed LLM engineers quarterly, ask what they are seeing in interviews they take as a candidate, ask what internal debates are happening on their team. That input shapes the next quarter's screen.

Why most generative AI candidate pools look the same

If you have interviewed LLM engineers in the last year, you have probably noticed that resumes start to look suspiciously similar. Same open-source project name-drops, same vendor logos, same three side projects. The reason is that the public inbound pool for generative AI roles is saturated with candidates who have completed two or three courses and built a demo but have not shipped anything to paying users.

Our sourcing on LLM searches is deliberately weighted away from that inbound pool. We lean on referrals from LLM engineers we have already placed, direct outreach to engineers who have written or spoken credibly about retrieval or agents, and the NYC engineer network that Tony has built over 20 years. It is slower upfront than posting a JD and waiting, and it is a meaningfully different pool by the time submittals land.

LLM engineering roles we have closed

Founding LLM engineers for seed-stage NYC AI startups
Senior LLM engineers owning production retrieval and agent systems at mid-stage teams
Applied research engineers who translate papers into product
LLM infrastructure engineers owning inference, caching, and cost
Eval and quality engineers for teams where correctness is regulated or revenue-critical

Engagement terms

Engineers in AI is a boutique NYC firm with a flat 20% placement fee. No retainer, no exclusivity, no minimum commitment. If a candidate leaves in the first 90 days, we refill at no additional cost. More than 1,000 placements over 20 years, with engagements across Agoda, Hearst, Con Edison, and Trilogy. We do not run every search we are offered. We only take the ones where the scope and comp are real.

Start a generative AI search

If you are hiring LLM or generative AI engineers and you want a recruiter whose screening bar keeps pace with the stack, book a hiring call. We will spend 45 minutes on your role, tell you where the market currently is on comp and scope, and be honest about whether a boutique firm is the right fit for the hire.

An LLM engineer recruiter whose screens evolve as fast as the stack