Retrieval-augmented generation is often described as "just plug in a vector database and an LLM." That description has launched a lot of demos and almost no shipped systems. The teams that actually ship production RAG know that the hard work sits between those two boxes, and a RAG engineer hire who cannot reason about that middle is not going to ship anything that survives real traffic.

Engineers in AI is a RAG engineer recruiter with a retrieval-specific screen. We do not accept "I built a RAG system with LangChain and Pinecone" as evidence. We ask what was in the system, how it was evaluated, and what broke when real users hit it.

What RAG engineering actually involves

A production RAG system has six or seven moving parts, and a real RAG engineer can reason about all of them.

Document processing: parsing, cleaning, metadata extraction, handling non-text content
Chunking strategy: fixed, semantic, hierarchical, and why the right answer depends on the corpus
Embedding choice: model selection, dimension tradeoffs, domain adaptation, cost
Vector store and indexing: hybrid search, filters, HNSW tuning, cold vs warm performance
Query rewriting and expansion: multi-query, HyDE, and when each actually helps
Reranking: cross-encoders, LLM-as-reranker, latency budgets
Eval: offline retrieval metrics, end-to-end quality, hallucination detection, human-in-the-loop

A candidate strong in three of those seven areas can be hired for a team where other engineers own the rest. A candidate strong in zero should not be hired as a RAG engineer. The screen below is about finding that difference.

The retrieval-specific screen

Tony Kochhar runs the first technical conversation on every senior RAG engineer search. The screen below is written to avoid the LangChain-tutorial trap: candidates who have wired libraries together but cannot reason about the tradeoffs.

Walk me through the retrieval stack you last shipped. For each layer, what did you measure, and what did you change as a result?
What does your eval set look like, how did you construct it, and how did you avoid leakage?
When did you move beyond pure vector search, and what did you add? Keyword, reranker, query rewriting?
Tell me about a retrieval failure in production. How did you detect it, and what was the root cause?
What is the latency budget of your retrieval stack, and where is your P95 going right now?
What did you change when the corpus doubled? What broke when documents got longer than your chunk size?

Strong candidates answer in specific numbers and specific tradeoffs. Weak candidates answer in library names.

Signals worth weighting heavily

The candidate has an eval set they built themselves, not one they copy-pasted from a benchmark.
The candidate has deliberately rejected a retrieval pattern that was trendy, and can explain why.
The candidate can describe the input distribution of their RAG system, not just the output quality.
The candidate talks about reranking before you ask about it. That is a shipping signal.
The candidate has a cost-per-query number in their head.

Red flags we watch for

Resume lists five vendor names but no evaluation approach. Every RAG engineer in the market has used Pinecone or Weaviate. Fewer have actually evaluated what came out of the other end.
"We used LangChain for retrieval." Not a flag on its own, but if the candidate cannot explain what the framework is doing under the hood, they are renting capability rather than owning it.
No discussion of hybrid search. Pure vector-only retrieval is rarely the right production answer, and a candidate who has never moved past it has not shipped to a demanding corpus.
Confidence with no failure stories. Every real RAG system has stumbled somewhere. A candidate who has only seen success has not run their system long enough.

How RAG roles vary across teams

A RAG engineer at an enterprise search company owns reranking and eval on a corpus measured in hundreds of millions of documents. A RAG engineer at a Series A AI startup often owns the full stack, from parsing to serving, on a corpus measured in tens of thousands. A RAG engineer at a regulated company spends half their time on source attribution and guardrails because the cost of a wrong citation is a compliance incident. Those are three different hires.

Before the first submittal we push to pin down which variant your team actually needs. The candidate who will flourish in one of those environments will stall in another. A specialist recruiter's job is to know that, and to calibrate the outreach before the first candidate ever lands in your inbox.

How we run a RAG search

The first call is 45 minutes on the role. We pull apart what the hire will actually ship in 90 days: is this a retrieval-specific seat on an existing LLM team, or is this the first RAG engineer shaping the whole stack? Those are different candidate profiles. We target accordingly, and our first submittal is usually in your inbox within two weeks.

Engineers in AI is a boutique NYC recruiting firm with a flat 20% placement fee, no retainer, no exclusivity, and a 90-day replacement guarantee. Over 1,000 placements in 20 years, including engagements with Agoda, Hearst, Con Edison, and Trilogy. We take the searches we can deliver on, and we are honest on the first call about the ones where we cannot.

Start a RAG engineer search

If you are hiring a retrieval-augmented generation engineer and you want a recruiter who can screen on retrieval quality, not just library familiarity, book a hiring call. 45 minutes on the role, a real read on the market, and an honest answer on whether a boutique firm is the right fit for your hire.

A RAG engineer recruiter who screens for retrieval, not for vendor names