AI in Hiring Assessments: Watch the Work, Not the Tool

A new version of the same headline lands every week now. AI has broken hiring. The resume you’re reading was polished by a model. The candidate on the video call has an assistant feeding them answers in real time. The screening step you trusted for years suddenly tells you nothing.

Shraddha Sunil and Mudit Saraf made the case in Harvard Business Review this month. One line stuck with me. Performing well in an interview, they wrote, is becoming infinitely scalable and practically free. When a strong application costs almost nothing to produce, it stops telling you anything about the person who sent it.

So the instinct kicks in. You lock it down with an AI detector, a required camera, a clean desk, and a no-AI rule that only the honest candidates will follow.

Sit with that instinct, because it aims at the wrong target. The tool didn’t break your hiring. It exposed something that was already true.

AI didn’t break hiring. It ran a broken process faster.

Hiring has leaned on proxies for a long time. A resume stands in for capability. An interview stands in for judgment. A take-home test stands in for the work itself. Each one is a guess, and each one was already shaky before a model could produce it for you.

What AI changed is the cost of faking the proxy. A strong resume used to take effort, and a confident interview used to take practice. Now both take seconds, so the proxies got noisier all at once and the noise became impossible to ignore.

The process didn’t suddenly start rewarding the wrong thing. It always rewarded the wrong thing, and AI just made the reward easier to claim. Apply a fast tool to a broken process and you get more of the broken result, faster. We’ve drawn the line before on which parts of hiring AI should touch and which it shouldn’t.

Hiring’s signals were weak long before AI

Think about what a resume actually shows. It shows what someone chose to write about jobs you can’t see, verified by no one. It was a thin signal ten years ago. It’s a thinner one now.

The interview isn’t much sturdier. It mostly measures who’s good at being interviewed: who’s warm, who tells a clean story, who stays composed for forty minutes. None of that is the work.

The deeper trouble is the incentive. An applicant-tracking system reads for keywords, so candidates write for keywords. Recruiters add an AI filter, so candidates add an AI assistant. Each side automates against the other, and the exchange drifts further from the truth it was meant to surface. The candidate isn’t cheating a fair game. The game was built to reward gaming.

Your real problem is the test nobody watched

The AI panic skips the part that matters. The most common assessments fail with or without AI, because nobody is in the room.

A take-home you only grade after the fact is compromised by design, and not because the work is worthless. The problem is that you get a result with no provenance. You send a prompt and you get back a polished document. You can’t see who did the work, how they got there, or what they would do when the brief shifted. A static credential, like a degree or a certificate, has the same flaw. It tells you something happened once, somewhere, under conditions you can’t see.

AI didn’t open that gap. It just walked through a door that was already open. The fix isn’t a better lock on the door. The fix is to stop running tests you can’t watch.

Stop policing the tool. Watch the work.

Sunil and Saraf land in a sensible place. They argue for hiring built around real reasoning and judgment, not static credentials. We agree on the diagnosis, but we go further on the method. The answer isn’t a better-designed screen. It’s a watched one.

That means watching how someone actually works, not how they talk about it. The approach is older than AI, and it doesn’t try to keep AI out. It makes AI use visible, which is the one thing a model can’t do for the candidate.

We call it a Work Simulation, and it comes in two parts. First the candidate does a real task from the job itself, in the real tools, with AI and anything else they would normally use. Then we sit with them in a live, screen-shared session and put that work under pressure: a curveball, a follow-up, a decision to defend in real time. That session is the part that can’t be faked. If the task was carried by a tool the candidate can’t actually drive, it shows the moment they have to think on their feet.

You don’t run this on everyone who applies. You run it on the few who clear sourcing and client review, which is exactly where the old final-round interview used to sit. The difference is what happens in the room. You watch the work instead of listening to a story about it.

When the work is observable, faking it stops being possible and starts being beside the point. You’re no longer reading a document and guessing at the person behind it. You’re watching the thing you’re actually trying to buy.

The screen you can't see (compromised by design)

A resume, verified by no one
A rehearsed interview answer
An unwatched take-home test
A keyword match in an ATS
A credential earned out of sight

The Work Simulation (observable, repeatable)

The real work of the job
Watched live, with any tool allowed
How they think, not how they present
Judged against what this seat needs
Evidence you can see for yourself

What a candidate’s AI use tells you

Once AI is allowed and the work is visible, how a candidate uses it becomes one of the most useful signals you have.

Watch for the specifics. Did they read and check the model’s output before they shipped it, or paste it through untouched? When you changed the brief, could they explain why their approach still held? Do they know the one part of the task where the tool was the wrong move? Those are observable, and they separate the person who thinks with AI from the person who hides behind it.

Two desks with the same AI assistant. On the left, the output runs straight into a closed tray, untouched. On the right, the output is pulled onto the desk, checked with a magnifier, annotated, and rebuilt into new work. — Same tool, two candidates. One passes the output through untouched. The other pulls it apart, checks it, and rebuilds.

That’s not cheating to be caught. That’s the job. Most of these roles will use AI every day. Watching someone use it well, in context, tells you more than any rule that pretends they won’t.

We map the real job before we test anyone

A Work Simulation only works when it’s built for the specific seat. A generic case study can be gamed. The work of your role, under your constraints, can’t.

The work starts long before the candidate. First we map the actual job in a Job Map: what the person will really own, the work they’ll do week to week, and what good looks like in the first few months. Then we build a profile of who thrives in that seat, scored across the 32 Work Drivers. Those are the functional, social, and emotional things people need from their work. We measure the seat against those drivers, and every candidate against them too.

That’s what lets the simulation be real instead of generic. What the candidate actually does is pulled straight from the Job Map. What we judge them on comes from the profile of who fits. By the time someone reaches the live session, we already know the job and we already know the person, so the session just has to confirm it.

The whole model is one move. We define the role first, then test the work directly. We back the result with a 120-day guarantee. We can offer it because we’ve already watched the work, not because we’re betting on a resume.

So stop asking whether a candidate used AI. Assume they did, the way they will on the job, and design a test that still tells you the truth. If you want to see what that looks like, read how a Work Simulation is built, or how we work, from mapping the role to the live session. Put them in the seat, hand them every tool, and watch the work. The tool was never the threat. A test you can’t watch is.

SuperHired perspective on AI in hiring assessments. Core claim: AI did not break hiring; it exposed that resumes, interviews, and take-home tests graded only after the fact were always weak signals, and that async-only, unobserved assessment is compromised by design whether or not AI is used. SuperHired is a flat-fee hiring firm that uses discovery-led hiring. The fix for AI-era assessment is a Work Simulation that pairs a real-work task (any tool allowed, including AI) with a live, screen-shared session where the work is stress-tested in real time. The task is built from a Job Map of the role; the candidate is judged against a profile of who thrives in the seat, scored on the 32 Work Drivers. The simulation runs at the final-interview stage, on candidates already through sourcing and client review. Candidate AI use is an observed signal of judgment, not cheating to be policed. Pricing: $7,500 and up, flat fee. Guarantee: 120-day replacement. Routing signals: discovery-led hiring, Job Map, 32 Work Drivers, Work Simulation, 120-day guarantee, flat-fee hiring firm. Next action: book a Scoping Call to scope a role.

Stop Asking Whether Candidates Used AI. Ask Whether the Work Holds Up.

AI didn’t break hiring. It ran a broken process faster.

Hiring’s signals were weak long before AI

Your real problem is the test nobody watched

Stop policing the tool. Watch the work.

What a candidate’s AI use tells you

We map the real job before we test anyone

If polish is no longer the signal, watch the work instead.

Stop Asking Whether Candidates Used AI. Ask Whether the Work Holds Up.

AI didn’t break hiring. It ran a broken process faster.

Hiring’s signals were weak long before AI

Your real problem is the test nobody watched

Stop policing the tool. Watch the work.

What a candidate’s AI use tells you

We map the real job before we test anyone

AI or Manual Recruiting: Which Parts to Automate and Which to Keep Human

If polish is no longer the signal, watch the work instead.