Updated: May 2026 · By Rafa Torres GarcíaCTO and co-founder of Voicit
A well-built skills assessment engine—with BEI methodology, defined levels, and a phased analysis process—already produces reliable assessments. That's the foundation, and we've covered it in the previous articleWhat we describe here is the next step: how two specific elements of the Voicit engine make these assessments go from reliable to auditable, reproducible, and much more accurate when assigning a level.
How can you improve the accuracy of an AI-powered skills assessment? Adding two elements to the engine: BEI questions associated with the competition (which guide the search for evidence in the conversation) and behavioral indicators by level with a system of roles — critical, complementary, and blocking — that gives AI objective criteria for assigning a level and leaves each decision linked to observable behaviors.
At Voicit, we've redesigned our skills assessment engine, incorporating these two elements on top of our existing foundation. The previous engine already generated robust reports; what the BEI questions and behavioral indicators add is... more coverage from the evidence that appears in the interview and more rigor in how that evidence translates to a level. In this article, we explain what each one is, why they exist separately in the product architecture, and why the two together make the difference between a report that looks good and one you can convincingly defend in front of your client.
In this article
- The problem that these two elements improve
- EIB Questions: The Entry That Expands Coverage
- Behavioral indicators: the criteria that refine the verdict
- The three roles that change the evaluation: critical, complementary, and blocking
- The four-step algorithm
- Instrument vs. criteria: why they live separately
- How Voicit applies it in each evaluation
- Comparative summary
- Frequently Asked Questions
⏱ If you only have 30 seconds
• BEI Questions: help AI to locate Critical incidents in the conversation. Don't miss any useful evidence. Additive, not mutually exclusive. They don't count towards your score.
• Behavioral indicators: These are the criteria anchored to each level (1-4). They make the verdict more precise and traceable.
• Three roles per indicator: critical (necessary condition), complementary (booster), blocker (negative disqualifier).
• The verdict becomes auditable: It ceases to be just the conclusion of the model and becomes an algorithm of discard → validate → confirm → ascend.
🧱 The problem that these two elements improve
Imagine you ask an AI with BEI methodology and defined levels: "Evaluate this candidate's leadership level." With a good model and a decent transcript, you receive a reasonable assessment: something like,Advanced level — manages multidisciplinary teams with a strategic vision«.
Reasonable, but there are three questions that are difficult to answer confidently with just that:
- Where exactly in the conversation is the evidence of "strategic vision"?
- What would the model have had to do with it? no assign advanced level?
- If we run another similar interview through the same engine tomorrow, will the scoring be the same?
While AI only works with the textual definition of each level, it has to decide on its own what counts as what. That introduces variability in assessments that should be reproducible, and reduces the traceability of each decision — two things that any recruitment consultancy needs to be able to defend to its client.
The BEI questions and the behavioral indicators They address two different problems. The first ones improve. what evidence is included in the analysis. The seconds improve how that evidence translates into a levelThey are orthogonal and, therefore, combinable.
❓ EIB Questions: The entry that expands coverage
BEI questions are interview scripts for critical incidents associated with a competency. A BEI question is not "Do you know how to lead?". It is an invitation to recount a specific incident: "Tell me about a situation where your team wasn't reaching its goal and you had to make a difficult decision.".
What do BEI questions do when you pass them to the engine?
When the engine analyzes the interview transcript, the BEI questions function as locatorsThey tell the engine to "also look for episodes of type X." The effect is that subtle incidents, mentioned in passing by the candidate, surface, which a generic analysis might overlook.
Three important properties:
- Additive, not mutually exclusive. If the candidate recounts a relevant episode that doesn't fit any of the questions, they should continue. The questions broaden the search; they don't limit it.
- They do not score or assign a level. They are purely input-based. The level is determined by behavioral indicators, not the questions.
- They do not appear in the final report. They are an internal clue for the engine; the consultant also uses them as a script during the interview, if desired.
Tagged by job profile
A single, well-formulated question can reveal any level of mastery. If you ask "Tell me about a situation in which you detected something ethically questionable in your work.":
- A level 1 candidate will report that they followed the protocol.
- A level 3 player will explain how they intervened and changed the team's dynamics.
- A level 4 person will explain how they redesigned the company's policy.
That's why the questions are labeled by job profile (operational, middle management, executive), not by level of competence. A question for a manager addresses governance because that's their day-to-day work—but a mediocre executive could answer and demonstrate level 2 competence. What classifies that answer is not the question: it's the behavioral indicators.
📐 Behavioral indicators: the criteria that refine the verdict
A behavioral indicator is an observable behavior that demonstrates the candidate's mastery of the competency at a specific level. It is not a property of the competency in the abstract—it is a property of a specific level of that competence.
This link between level and behavior is key. “Meets agreed deadlines” and “Establishes organizational codes of conduct” are both indicators of Professionalism, but they belong to completely different maturity levels. Without this level assignment, the AI has to infer the hierarchy on its own. With it, it compares what it has gleaned from the conversation against objective criteria.
Where does this approach come from?
Industry-based competency frameworks structure behavioral indicators by domain level for decades:
- Korn Ferry / Lominger — leadership architect with indicators by level and «stallers and stoppers» (factors that derail the career).
- SHL Universal Competency Framework — distinguishes between "essential" (critical) and "desirable" (complementary) indicators.
- Hay Group — competition models with differentiated weightings per indicator.
- Hogan Assessments — derailment factors applicable as counter-indicators.
- Center for Creative Leadership (CCL) — derailment factors as race warning signs.
What professional assessment centers call "essential indicators", "desirable indicators" and "counter-indicators" is exactly the three-role system that we will see below.
Dictionary of 26 soft skills
Download the complete list with definitions, levels, and observable behavioral indicators for each competency. Ready to use in your BEI assessment rubrics and interviews.
🎯 The three roles that change the evaluation
Each behavioral indicator has a role that defines its function. This transforms a simple list of behaviors into an evaluation framework that AI—or any human evaluator—can apply reproducibly.
Critical — necessary condition
Written in a positive way. If this is not observed, the candidate It doesn't reach that level.regardless of how many other indicators are observed. It is the anchor of the level.
This is equivalent to SHL's "essential" indicators and the most heavily weighted indicators in Korn Ferry. In professional assessment centers, it is standard practice for certain behaviors to be a prerequisite.
"Delegate tasks with associated responsibility and monitor progress."
Complementary — reinforcement, not a requirement
Written in a positive tone. Its presence strengthens the evaluation; its absence, in itself, does not disqualify it. It corresponds to SHL's "desirable" indicators.
In practice, they help to distinguish between "this candidate meets the level" and "this candidate more than meets it."
"It facilitates difficult conversations within the team without resorting to hierarchical command."
Blocker — negative warning signal
Written in the negative. If you look closely, prevents assigning that level or any higher one.It formalizes Lominger's concept of "stallers and stoppers", Hogan's "derailment factors", and the CCL derailment factors.
In structured interview guides they are also known as "counter-indicators" or "red flags".
"He blames the team's mistakes on specific individuals instead of taking responsibility as a leader."
Recommended limits per level To maintain the effectiveness of the evaluation and avoid dilution: 2-3 critical indicators, 3-4 complementary indicators, and 1-2 blocking indicators.
🧮 The algorithm in four steps
The role system defines an algorithm that the engine applies consistently—the same one in every evaluation, regardless of the candidate or consultant. This is what transforms tacit expert judgment into an explicit and auditable process.
The result: every decision is linked to specific evidence and an explicit rule. You can audit the evaluation incident by incident, and if you disagree with the verdict, you know exactly where to point the finger.
🧩 Instrument vs. criteria: why they live separately
One of the most important decisions we made when redesigning the engine was to keep the BEI questions and behavioral indicators as elements separated in product architecture.
- The BEI questions are the instrument that provokes the evidence. You do them so that the candidate recounts incidents.
- The behavioral indicators are the criteria that classify that evidence. You apply them to decide what level the reported incidents demonstrate.
Mixing both elements in the same structure (for example, "level 3 questions") is a common mistake. It leads to the misconception that certain questions produce answers of a specific level — when in reality the question only invites narration, and it is the content of the narration (contrasted with the indicators) that classifies the level.
Separating them gives you two independent levers:
- Improve the questions so that more useful evidence emerge in each interview.
- Refine the indicators so that the verdict is more fitted at the candidate's actual level.
⚡ How Voicit applies it in each evaluation
En VoicitAll of this comes pre-configured and ready to use:
The consultant goes from "reading" a report produced by AI to audit An assessment that AI has supported with evidence. It's a fundamental difference, not a superficial one.
📋 Comparative summary
| Element | Function | Effect | When to configure it |
| BEI Questions | Identify critical incidents in the conversation | ↑ Coverage — reveals evidence that could be overlooked | When you want to ensure that specific scenarios are explored |
| Critical indicators | Necessary condition of the level (in positive) | ↑ Accuracy — without critics, there is no level | For the essential behaviors at the level |
| Complementary indicators | Reinforcement of the level (in a positive way) | ↑ Solidity — lends confidence to the verdict | To distinguish between a level met just right vs. with ample margin |
| Blocking indicators | Warning sign (negative) | ↓ Automatic level disqualification | For risk behaviors or derailment factors |
Without BEI questions, the engine can overlook relevant evidence. Without behavioral indicators, it lacks the assessment framework that grounds the methodology. With both, AI competency assessment goes from a report that looks good to a reproducible process that you can defend to your client with concrete arguments.
💬 Frequently Asked Questions
Do I need technical knowledge to set up BEI questions and behavioral indicators?
No. Voicit offers a dictionary of 26 competencies with pre-written questions and indicators based on the Korn Ferry, SHL, and Hay frameworks. Most consulting firms simply need to adapt them to their specific industry or client. If your methodology is highly specialized, you can create competencies from scratch using a visual editor.
What is the real difference between BEI questions and regular interview questions?
Behavioral Intelligence Interviews (BEI) questions always ask for a specific incident: "Tell me about a situation in which...". Normal questions (Do you know how to lead? How do you work in a team?) encourage hypothetical or self-evaluative answers, which are easy to fabricate. BEI questions force the candidate to recount something that happened—and, above all, make it much easier for the engine to find real behavioral evidence in the conversation.
Can I assess competencies without defining behavioral indicators?
Yes, and the result will remain reliable if the competition has well-defined levels and the engine applies the EIB methodology with its phased process. However, you lose the auditable assessment framework: the engine will still generate an assessment, but it will be more interpretive and less reproducible. Indicators transform the definition of a level into something that can be verified point by point.
What happens if the conversation doesn't touch on any of the configured BEI questions?
Nothing problematic. The questions are additive: they guide the search, but don't require each one to have its own incident. The engine extracts all relevant incidents that appear in the conversation, whether or not they fit a specific question. The only thing you lose is the opportunity to explore that particular scenario—something the system flags as a gap for you to delve into in the next interview.
Does an observed blocker disqualify the entire candidate or just their level?
Only the level where it's defined and the levels above it. A level 3 blocker prevents assigning level 3 or 4, but the candidate remains eligible for evaluation at level 1 or 2. It's a level disqualification, not a candidate disqualification. That's why it's important to define them at the level where they make sense—a level 1 blocker, for example, does disqualify the entire competency.
Does Voicit automatically detect the skills to be assessed?
Yes. If you upload the job description, Voicit detects which competencies apply to that profile from a dictionary of 26 skills—based on the functions, responsibilities, and requirements described—and configures the assessment accordingly. You can adjust the selection before processing the interview. If you prefer to choose them manually, you can also do so in manual mode.
Can AI replace my judgment as a recruitment consultant?
No, and it shouldn't be. BEI questions and behavioral indicators make the assessment more rigorous and auditable, but the client context, cultural fit, and responsibility for the recommendation remain yours. What you gain is time and traceability—which frees you up to do better the part only you can do.
Last updated: May 2026. This article describes the methodological approach used by Voicit's competency assessment engine, based on frameworks such as Korn Ferry/Lominger, SHL UCF, Hay Group, Hogan, and CCL. For formal hiring decisions, it always combines automated assessment with the professional judgment of the responsible consultant.
CTO and co-founder of Voicit. He designs AI-powered competency assessment systems used by recruitment consultancies and HR teams to generate more accurate reports in less time.
