Assessing skills with AI without the model freaking out: a guide for recruitment consultancies (2026)

May 15, 2026

Competencies · AI

Assessing competencies with AI without hallucinations

Objective assessment with AI · without hallucinations

Updated: May 2026 · By Rafa Torres GarcíaCTO and co-founder of Voicit

Soft skills analysis is one of the most complex aspects of any selection process. Unlike technical skills, which can be validated with tests or certifications, behavioral skills require qualitative evidence: accounts of experiences, concrete examples of behavior, and demonstrable results.

How do you assess AI skills without hallucinations? Applying behavioral methodology (BEI, Critical Incident Technique, STAR), a dictionary of competencies with well-defined levels and a process divided into three phases — extraction, evaluation and synthesis — with complete traceability at every moment of the conversation.

With the explosion of AI tools in HR, many recruitment consultancies They are experimenting with ChatGPT, Claude, or other models to assess competencies through interviews. The problem: most of them obtain superficial, inconsistent, or outright fabricated results.

In this article we explain why AI is amazed at analyzing skills, what you need to obtain reliable assessments, and how Voicit It automatically solves this problem for consulting firms and selection teams.

In this article

Why AI gets confused when assessing skills
Real example of hallucination
The 4 keys to reliably assessing AI skills
The problem with implementing this manually
How Voicit solves it automatically
What makes Voicit different
Summary: What to ask an AI to assess skills
Frequently Asked Questions

⏱ If you only have 30 seconds

• Why AI fails: Without methodology, it searches for keywords, not behaviors. It fills in the gaps with assumptions.

• What you need to get it right: complete critical incidents (situation + action + result), dictionary with levels and phased process.

• What Voicit contributes: the three pillars applied automatically, with temporal traceability for each incident.

• Who is it for? recruitment consultancies and HR teams that interview several candidates per week.

🤖 Why AI gets confused when assessing skills

Language models like GPT or Claude are incredibly good at generating coherent text. But that doesn't mean they can assess professional skills.

When you ask an LLM to "assess the leadership" of a candidate based on a transcript, the model:

Seeks keywords related to leadership (team, project, coordination).
Interpret any vague mention as evidence ("I worked with the team" = leadership).
Fill in the gaps with assumptions based on statistical patterns from your training.
It generates evaluations that They sound reasonable but they are not supported by solid behavioral evidence.

The result: reports that look professional but don't stand up to critical analysis. For a recruitment consultancy, that's a double risk—you compromise the quality of the report you deliver to the client and your own professional reputation.

🔍 Real example of a hallucination

Candidate transcript
"In my previous job, I coordinated with the marketing team to launch campaigns."

Output of an unstructured AI
"The candidate demonstrates advanced-level leadership competence by coordinating multidisciplinary teams and managing high-impact projects."

Problems detected:

We don't know if the candidate led or simply coordinated.
There is no evidence of "high impact".
We do not know the outcome of the campaigns.
"Advanced level" is assigned without clear criteria.

This isn't the model's fault. It's the fault of how we ask it to work.

🧭 The 4 keys to reliably assessing AI skills

Based on our experience building Voicit's competency assessment system, these are the four key factors that make the difference between a superficial analysis and one that is truly useful for recruitment consultancies.

Methodology

1. Use behavioral methodology, not keywords

Competency assessment is not a search for terms. It is a behavioral analysis based on proven methodologies.

Critical Incident Interviews (Behavioral Event Interview — BEI).
Flanagan's Critical Incident Technique (1954).
STAR / SAR Model (Situation, Task, Action, Result).

AI must seek complete critical incidents: accounts of specific situations where the candidate took concrete actions that produced measurable results.

It is not valid evidence
"I have leadership experience." "I'm good at teamwork." "I've managed complex projects."

Yes, it is valid evidence.
"When Project X was delayed two weeks (situation)I reorganized the sprint and redistributed tasks among three developers (action)This allowed us to deliver with only a three-day delay and retain the customer. (result).»

Extraction

2. Structure the information extraction

LLMs need clear guidelines on what to extract and how to classify it. It's not enough to simply ask them to "analyze this competency."

A good analysis system should extract from each critical incident:

Full context — situation and task.
Specific behavior — what exactly did the candidate do?
Observable result — what happened as a consequence.
Impact — positive or negative for the assessed competence.
Intensity — weak, moderate, strong.
Time references — Where is this in the conversation?

This structure forces the model to look for real evidence instead of making assumptions.

Levels

3. Define clear competency levels

One of the most common mistakes: asking the model to evaluate a competency without giving them evaluation criteria.

Bad approach
"Evaluate the candidate's leadership."

Good approach
Provide a dictionary of skills that defines what each competency means, what levels exist (Basic, Intermediate, Advanced, Expert) and what behaviors characterize each level.

Example of a well-defined level — Leadership, Intermediate Level:

"Leads small teams (3-5 people) on projects with clear objectives. Delegates tasks, follows up, and resolves basic conflicts. Results depend on occasional supervision from a senior leader."

With this definition, the model can compare the evidence gathered against objective criteria.

Process

4. Separate extraction, evaluation, and synthesis

The best results don't come from asking the model to do everything in one step. It's better to divide the process into three phases:

Phase 1 — Extraction: Identify all critical incidents related to competition.
Phase 2 — Evaluation: Analyze incidents against the dictionary of competencies and assign a level.
Phase 3 — Synthesis: Generate an interpretive summary with justification, detected patterns, and gaps.

This separation allows:

Greater precision in each phase.
Complete traceability (each conclusion linked to specific evidence).
Possibility to audit and improve each step.
Identify which aspects need to be explored in more depth during the interview.

📚

Free resource

Dictionary of 26 soft skills

Download the complete list with definitions, levels, and observable behavioral indicators for each competency. Ready to use in your BEI assessment rubrics and interviews.

Download a free dictionary →

⚙️ The problem with implementing this manually

Now that you know the theory, the practical reality is: implementing such a system requires time, technical expertise, and many iterations.

You would need:

Design complex prompts for each phase of the analysis.
Create and maintain your dictionary of competencies with well-defined levels.
Integrate with AI APIs and manage token limits, costs, and latency.
Structuring the data to maintain traceability.
Iterate constantly to improve accuracy.
Adapt the system to each type of interview and position.

For a recruitment team or a consulting firm, this is unfeasible. There's no point in building technology when you should be focused on finding the best talent.

✅ How Voicit solves it automatically

En Voicit We have built this entire system so that recruitment consultancies can generate reliable skills assessments without having to think about technology.

This is how it works in practice:

You conduct the interview normally. Voicit automatically transcribes the conversation (face-to-face, online, telephone).
You select the skills to be assessed. From the skills dictionary or by creating custom skills for your team.
You generate the report. The system analyzes the conversation using the three-phase methodology described above.
You receive a structured assessment with detected level, justification based on critical incidents, specific evidence with time references and recommendations on what to investigate further.

Try Voicit for free →

🧩 What makes Voicit different

Complete traceability

Each assessment is linked to specific moments in the conversation. You can verify the evidence and compare it with your own professional judgment—without having to listen to the entire recording.

Proven methodology

We don't use AI haphazardly. We apply behavioral assessment frameworks (BEI, Critical Incident Technique) with decades of academic and business validation.

Complement your professional judgment

Voicit doesn't replace the consultant. It gives you structured evidence that you can compare with your own conclusions, combine with formal test results, and identify what you need to explore further in the next interview.

Team customization

Each consulting firm has its own way of assessing skills. Voicit allows you to create shared skills dictionaries for your team, tailored to your methodology, sector, or client.

📋 Summary: What to ask an AI to assess skills

If you're going to use AI to assess skills in your recruitment processes, make sure the system meets these six minimum requirements. If it doesn't meet four of these six, you're very likely making hiring decisions based on well-written delusions.

Pillar	What do you have to do?	Risk if it fails
Behavioral methodology	Apply BEI / STAR / Critical Incidents	Confuses words with evidence
Dictionary of Competencies	Define clear levels (Basic → Expert)	Assigns levels without objective criteria
Structured extraction	Capture situation, action, and result	Fill in the blanks with assumptions
Phased process	Extraction → Evaluation → Synthesis	Mix facts with interpretation
Traceability	Each conclusion linked to a timestamp	You cannot audit the decision
Human judgment	Complement, never replace, the consultant	You lose context and responsibility.

💬 Frequently Asked Questions

Does Voicit replace traditional competency-based interviews?

No. Voicit enhances your current process. You can conduct interviews as you always have and obtain a structured analysis that complements your professional judgment.

How accurate is the competency analysis?

Accuracy depends on the quality of the conversation. If the candidate provides complete critical incidents (situation, action, outcome), the analysis is highly reliable. If the conversation is vague or superficial, the system detects this and indicates which aspects need further exploration.

Can I use my own skills and levels?

Yes. Voicit allows you to create custom competency dictionaries that you can share with your team. Many consulting firms adapt our base dictionary to their own methodology or their client's industry.

Can AI replace my judgment as a recruitment consultant?

No, and it shouldn't. Well-applied AI structures the evidence and reduces the noise so you can make better decisions, faster. The judgment regarding cultural fit, intuition, and responsibility for the recommendation remains yours.

Is it safe to give interviews to an AI?

It depends on the tool. At Voicit, the data is encrypted, not used to train models, and the system complies with GDPR. If you're using a generic LLM, be sure to review the data usage policy before uploading candidate recordings.

Last updated: May 2026. This article describes how to build or evaluate an AI system for competency analysis and reflects the methodology we use at Voicit. For formal hiring decisions, always combine the automated assessment with the professional judgment of the responsible consultant.

Try Voicit for free →

Rafa Torres García
CTO and co-founder of Voicit. He designs AI-powered competency assessment systems used by recruitment consultancies and HR teams to generate more accurate reports in less time.