Updated: May 2026 · By Rafa Torres GarcíaCTO and co-founder of Voicit
Soft skills analysis is one of the most complex aspects of any selection process. Unlike technical skills, which can be validated with tests or certifications, behavioral skills require qualitative evidence: accounts of experiences, concrete examples of behavior, and demonstrable results.
How do you assess AI skills without hallucinations? Applying behavioral methodology (BEI, Critical Incident Technique, STAR), a dictionary of competencies with well-defined levels and a process divided into three phases — extraction, evaluation and synthesis — with complete traceability at every moment of the conversation.
With the explosion of AI tools in HR, many recruitment consultancies They are experimenting with ChatGPT, Claude, or other models to assess competencies through interviews. The problem: most of them obtain superficial, inconsistent, or outright fabricated results.
In this article we explain why AI is amazed at analyzing skills, what you need to obtain reliable assessments, and how Voicit It automatically solves this problem for consulting firms and selection teams.
In this article
⏱ If you only have 30 seconds
• Why AI fails: Without methodology, it searches for keywords, not behaviors. It fills in the gaps with assumptions.
• What you need to get it right: complete critical incidents (situation + action + result), dictionary with levels and phased process.
• What Voicit contributes: the three pillars applied automatically, with temporal traceability for each incident.
• Who is it for? recruitment consultancies and HR teams that interview several candidates per week.
🤖 Why AI gets confused when assessing skills
Language models like GPT or Claude are incredibly good at generating coherent text. But that doesn't mean they can assess professional skills.
When you ask an LLM to "assess the leadership" of a candidate based on a transcript, the model:
- Seeks keywords related to leadership (team, project, coordination).
- Interpret any vague mention as evidence ("I worked with the team" = leadership).
- Fill in the gaps with assumptions based on statistical patterns from your training.
- It generates evaluations that They sound reasonable but they are not supported by solid behavioral evidence.
The result: reports that look professional but don't stand up to critical analysis. For a recruitment consultancy, that's a double risk—you compromise the quality of the report you deliver to the client and your own professional reputation.
🔍 Real example of a hallucination
"In my previous job, I coordinated with the marketing team to launch campaigns."
"The candidate demonstrates advanced-level leadership competence by coordinating multidisciplinary teams and managing high-impact projects."
Problems detected:
- We don't know if the candidate led or simply coordinated.
- There is no evidence of "high impact".
- We do not know the outcome of the campaigns.
- "Advanced level" is assigned without clear criteria.
This isn't the model's fault. It's the fault of how we ask it to work.
🧭 The 4 keys to reliably assessing AI skills
Based on our experience building Voicit's competency assessment system, these are the four key factors that make the difference between a superficial analysis and one that is truly useful for recruitment consultancies.
1. Use behavioral methodology, not keywords
Competency assessment is not a search for terms. It is a behavioral analysis based on proven methodologies.
- Critical Incident Interviews (Behavioral Event Interview — BEI).
- Flanagan's Critical Incident Technique (1954).
- STAR / SAR Model (Situation, Task, Action, Result).
AI must seek complete critical incidents: accounts of specific situations where the candidate took concrete actions that produced measurable results.
"I have leadership experience." "I'm good at teamwork." "I've managed complex projects."
"When Project X was delayed two weeks (situation)I reorganized the sprint and redistributed tasks among three developers (action)This allowed us to deliver with only a three-day delay and retain the customer. (result).»
2. Structure the information extraction
LLMs need clear guidelines on what to extract and how to classify it. It's not enough to simply ask them to "analyze this competency."
A good analysis system should extract from each critical incident:
- Full context — situation and task.
- Specific behavior — what exactly did the candidate do?
- Observable result — what happened as a consequence.
- Impact — positive or negative for the assessed competence.
- Intensity — weak, moderate, strong.
- Time references — Where is this in the conversation?
This structure forces the model to look for real evidence instead of making assumptions.
3. Define clear competency levels
One of the most common mistakes: asking the model to evaluate a competency without giving them evaluation criteria.
"Evaluate the candidate's leadership."
Provide a dictionary of skills that defines what each competency means, what levels exist (Basic, Intermediate, Advanced, Expert) and what behaviors characterize each level.
Example of a well-defined level — Leadership, Intermediate Level:
With this definition, the model can compare the evidence gathered against objective criteria.
4. Separate extraction, evaluation, and synthesis
The best results don't come from asking the model to do everything in one step. It's better to divide the process into three phases:
- Phase 1 — Extraction: Identify all critical incidents related to competition.
- Phase 2 — Evaluation: Analyze incidents against the dictionary of competencies and assign a level.
- Phase 3 — Synthesis: Generate an interpretive summary with justification, detected patterns, and gaps.
This separation allows:
- Greater precision in each phase.
- Complete traceability (each conclusion linked to specific evidence).
- Possibility to audit and improve each step.
- Identify which aspects need to be explored in more depth during the interview.
Dictionary of 26 soft skills
Download the complete list with definitions, levels, and observable behavioral indicators for each competency. Ready to use in your BEI assessment rubrics and interviews.
⚙️ The problem with implementing this manually
Now that you know the theory, the practical reality is: implementing such a system requires time, technical expertise, and many iterations.
You would need:
- Design complex prompts for each phase of the analysis.
- Create and maintain your dictionary of competencies with well-defined levels.
- Integrate with AI APIs and manage token limits, costs, and latency.
- Structuring the data to maintain traceability.
- Iterate constantly to improve accuracy.
- Adapt the system to each type of interview and position.
For a recruitment team or a consulting firm, this is unfeasible. There's no point in building technology when you should be focused on finding the best talent.
✅ How Voicit solves it automatically
En Voicit We have built this entire system so that recruitment consultancies can generate reliable skills assessments without having to think about technology.
This is how it works in practice:
- You conduct the interview normally. Voicit automatically transcribes the conversation (face-to-face, online, telephone).
- You select the skills to be assessed. From the skills dictionary or by creating custom skills for your team.
- You generate the report. The system analyzes the conversation using the three-phase methodology described above.
- You receive a structured assessment with detected level, justification based on critical incidents, specific evidence with time references and recommendations on what to investigate further.
🧩 What makes Voicit different
Complete traceability
Each assessment is linked to specific moments in the conversation. You can verify the evidence and compare it with your own professional judgment—without having to listen to the entire recording.
Proven methodology
We don't use AI haphazardly. We apply behavioral assessment frameworks (BEI, Critical Incident Technique) with decades of academic and business validation.
Complement your professional judgment
Voicit doesn't replace the consultant. It gives you structured evidence that you can compare with your own conclusions, combine with formal test results, and identify what you need to explore further in the next interview.
Team customization
Each consulting firm has its own way of assessing skills. Voicit allows you to create shared skills dictionaries for your team, tailored to your methodology, sector, or client.
📋 Summary: What to ask an AI to assess skills
If you're going to use AI to assess skills in your recruitment processes, make sure the system meets these six minimum requirements. If it doesn't meet four of these six, you're very likely making hiring decisions based on well-written delusions.
| Pillar | What do you have to do? | Risk if it fails |
| Behavioral methodology | Apply BEI / STAR / Critical Incidents | Confuses words with evidence |
| Dictionary of Competencies | Define clear levels (Basic → Expert) | Assigns levels without objective criteria |
| Structured extraction | Capture situation, action, and result | Fill in the blanks with assumptions |
| Phased process | Extraction → Evaluation → Synthesis | Mix facts with interpretation |
| Traceability | Each conclusion linked to a timestamp | You cannot audit the decision |
| Human judgment | Complement, never replace, the consultant | You lose context and responsibility. |
💬 Frequently Asked Questions
Does Voicit replace traditional competency-based interviews?
No. Voicit enhances your current process. You can conduct interviews as you always have and obtain a structured analysis that complements your professional judgment.
How accurate is the competency analysis?
Accuracy depends on the quality of the conversation. If the candidate provides complete critical incidents (situation, action, outcome), the analysis is highly reliable. If the conversation is vague or superficial, the system detects this and indicates which aspects need further exploration.
Can I use my own skills and levels?
Yes. Voicit allows you to create custom competency dictionaries that you can share with your team. Many consulting firms adapt our base dictionary to their own methodology or their client's industry.
Can AI replace my judgment as a recruitment consultant?
No, and it shouldn't. Well-applied AI structures the evidence and reduces the noise so you can make better decisions, faster. The judgment regarding cultural fit, intuition, and responsibility for the recommendation remains yours.
Is it safe to give interviews to an AI?
It depends on the tool. At Voicit, the data is encrypted, not used to train models, and the system complies with GDPR. If you're using a generic LLM, be sure to review the data usage policy before uploading candidate recordings.
Last updated: May 2026. This article describes how to build or evaluate an AI system for competency analysis and reflects the methodology we use at Voicit. For formal hiring decisions, always combine the automated assessment with the professional judgment of the responsible consultant.
CTO and co-founder of Voicit. He designs AI-powered competency assessment systems used by recruitment consultancies and HR teams to generate more accurate reports in less time.
