Bryq vs TestGorillacomparison page hero with pre-employment assessment headline

AI Fluency Assessment.
Measured Through Performance

Most AI fluency tools rely on self-report. Candidates and employees rate themselves 1 to 5, and the data is unreliable. Bryq measures AI fluency through scenario performance, whether you are hiring AI-fluent talent or identifying the genuine experts on your existing team. Real tasks. Real AI tools. Scored outputs.

Bryq vs TestGorillacomparison page hero with pre-employment assessment headline

AI Fluency Assessment.
Measured Through Performance

Most AI fluency tools rely on self-report. Candidates and employees rate themselves 1 to 5, and the data is unreliable. Bryq measures AI fluency through scenario performance, whether you are hiring AI-fluent talent or identifying the genuine experts on your existing team. Real tasks. Real AI tools. Scored outputs.

Bryq vs TestGorillacomparison page hero with pre-employment assessment headline

AI Fluency Assessment.
Measured Through Performance

Most AI fluency tools rely on self-report. Candidates and employees rate themselves 1 to 5, and the data is unreliable. Bryq measures AI fluency through scenario performance, whether you are hiring AI-fluent talent or identifying the genuine experts on your existing team. Real tasks. Real AI tools. Scored outputs.

Why self-reported AI fluency does not work

Why self-reported AI fluency does not work

Self-rated skill correlates weakly with measured skill, especially in novel domains. The original Kruger & Dunning study (Journal of Personality and Social Psychology, 1999) has been replicated across domains since, and AI is the textbook case. People who have used ChatGPT three times often rate themselves higher than people who have built and deployed production AI workflows for two years. This matters for hiring decisions. It matters even more inside your existing team, where the loudest self-rated experts are often not the ones doing the strongest work.


For hiring decisions, this introduces unacceptable variance. A self-report score does not distinguish the confident-but-incompetent candidate from the actually-fluent one. The same problem shows up inside an organisation when you are mapping internal AI capability. Worse, self-rating introduces bias. Research consistently shows demographic patterns in self-assessment confidence that have nothing to do with actual capability. Performance-based measurement eliminates both problems, whether you are hiring or baselining your existing team.


Most "AI fluency assessment" content on the market today is self-administered. Candidates check boxes about which AI tools they have used, rate their confidence, and the system aggregates the scores. It is fast. It is also nearly worthless for hiring.

Self-rated skill correlates weakly with measured skill, especially in novel domains. The original Kruger & Dunning study (Journal of Personality and Social Psychology, 1999) has been replicated across domains since, and AI is the textbook case. People who have used ChatGPT three times often rate themselves higher than people who have built and deployed production AI workflows for two years. This matters for hiring decisions. It matters even more inside your existing team, where the loudest self-rated experts are often not the ones doing the strongest work.


For hiring decisions, this introduces unacceptable variance. A self-report score does not distinguish the confident-but-incompetent candidate from the actually-fluent one. The same problem shows up inside an organisation when you are mapping internal AI capability. Worse, self-rating introduces bias. Research consistently shows demographic patterns in self-assessment confidence that have nothing to do with actual capability. Performance-based measurement eliminates both problems, whether you are hiring or baselining your existing team.


Most "AI fluency assessment" content on the market today is self-administered. Candidates check boxes about which AI tools they have used, rate their confidence, and the system aggregates the scores. It is fast. It is also nearly worthless for hiring.

Self-rated skill correlates weakly with measured skill, especially in novel domains. The original Kruger & Dunning study (Journal of Personality and Social Psychology, 1999) has been replicated across domains since, and AI is the textbook case. People who have used ChatGPT three times often rate themselves higher than people who have built and deployed production AI workflows for two years. This matters for hiring decisions. It matters even more inside your existing team, where the loudest self-rated experts are often not the ones doing the strongest work.


For hiring decisions, this introduces unacceptable variance. A self-report score does not distinguish the confident-but-incompetent candidate from the actually-fluent one. The same problem shows up inside an organisation when you are mapping internal AI capability. Worse, self-rating introduces bias. Research consistently shows demographic patterns in self-assessment confidence that have nothing to do with actual capability. Performance-based measurement eliminates both problems, whether you are hiring or baselining your existing team.


Most "AI fluency assessment" content on the market today is self-administered. Candidates check boxes about which AI tools they have used, rate their confidence, and the system aggregates the scores. It is fast. It is also nearly worthless for hiring.

What "performance-based AI fluency" means

What "performance-based AI fluency" means

Anthropic's AI Fluency Framework defines fluency as a set of trainable behaviours. Bryq complements that with the measurement layer. Where Anthropic's framework describes what fluency looks like, Bryq's assessment tells you who has it. The candidate or employee is given a scenario, a set of AI tools, and clear constraints. They perform the task within a time window and submit the output. Bryq scores the output against multiple dimensions: quality, completeness, error detection, ethical handling, time efficiency.


What the candidate cannot do: bluff. The output is the data. If they say they are fluent and produce a hallucinated mess, they are not fluent. If they say they are a beginner and produce strong, evaluated work in 12 minutes, they are fluent. The measurement matches the reality.

Anthropic's AI Fluency Framework defines fluency as a set of trainable behaviours. Bryq complements that with the measurement layer. Where Anthropic's framework describes what fluency looks like, Bryq's assessment tells you who has it. The candidate or employee is given a scenario, a set of AI tools, and clear constraints. They perform the task within a time window and submit the output. Bryq scores the output against multiple dimensions: quality, completeness, error detection, ethical handling, time efficiency.


What the candidate cannot do: bluff. The output is the data. If they say they are fluent and produce a hallucinated mess, they are not fluent. If they say they are a beginner and produce strong, evaluated work in 12 minutes, they are fluent. The measurement matches the reality.

Anthropic's AI Fluency Framework defines fluency as a set of trainable behaviours. Bryq complements that with the measurement layer. Where Anthropic's framework describes what fluency looks like, Bryq's assessment tells you who has it. The candidate or employee is given a scenario, a set of AI tools, and clear constraints. They perform the task within a time window and submit the output. Bryq scores the output against multiple dimensions: quality, completeness, error detection, ethical handling, time efficiency.


What the candidate cannot do: bluff. The output is the data. If they say they are fluent and produce a hallucinated mess, they are not fluent. If they say they are a beginner and produce strong, evaluated work in 12 minutes, they are fluent. The measurement matches the reality.

The five dimensions of AI fluency Bryq measures

The five dimensions of AI fluency Bryq measures

Dimension

Dimension

Dimension

What it measures

What it measures

What it measures

AI Task Strategy

AI Task Strategy

What it

measures

Picks the right approach quickly. Knows when to use AI vs. when not to. Sets clear hand-off boundaries.

Picks the right approach quickly. Knows when to use AI vs. when not to. Sets clear hand-off boundaries.

Picks the right approach quickly. Knows when to use AI vs. when not to. Sets clear hand-off boundaries.

Prompting & Interaction

Prompting & Interaction

Core

approach

Designs effective prompts on first or second attempt. Iterates productively. Gets useful output efficiently.

Designs effective prompts on first or second attempt. Iterates productively. Gets useful output efficiently.

Designs effective prompts on first or second attempt. Iterates productively. Gets useful output efficiently.

Critical Evaluation

Critical Evaluation

AI

proficiency

Spots hallucinations, bias, and weak output. Verifies before submitting. Knows what "good" looks like.

Spots hallucinations, bias, and weak output. Verifies before submitting. Knows what "good" looks like.

Spots hallucinations, bias, and weak output. Verifies before submitting. Knows what "good" looks like.

Ethical & Responsible Use

Ethical & Responsible Use

Candidate

experience

Handles sensitive data correctly. Maintains transparency. Escalates appropriately. No corners cut.

Handles sensitive data correctly. Maintains transparency. Escalates appropriately. No corners cut.

Handles sensitive data correctly. Maintains transparency. Escalates appropriately. No corners cut.

Workflow Integration

Workflow Integration

Candidate

experience

Embeds AI naturally into the work. No friction. Quality checks built in. Time-efficient.

Embeds AI naturally into the work. No friction. Quality checks built in. Time-efficient.

Embeds AI naturally into the work. No friction. Quality checks built in. Time-efficient.

Fluency is the manifestation of competency across all five dimensions. A person can be competent in any single dimension and not fluent overall; fluency requires the dimensions to operate together as a natural extension of the work.

Fluency is the manifestation of competency across all five dimensions. A person can be competent in any single dimension and not fluent overall; fluency requires the dimensions to operate together as a natural extension of the work.

Fluency is the manifestation of competency across all five dimensions. A person can be competent in any single dimension and not fluent overall; fluency requires the dimensions to operate together as a natural extension of the work.

Sample tasks, what candidates actually do

Sample tasks, what candidates actually do

Three anonymised examples of the kinds of scenarios candidates and employees work through. The same tasks run for hiring and for internal baselining, with role-relative scoring. Specific items rotate to maintain assessment integrity; the structure stays the same.

Three anonymised examples of the kinds of scenarios candidates and employees work through. The same tasks run for hiring and for internal baselining, with role-relative scoring. Specific items rotate to maintain assessment integrity; the structure stays the same.

Three anonymised examples of the kinds of scenarios candidates and employees work through. The same tasks run for hiring and for internal baselining, with role-relative scoring. Specific items rotate to maintain assessment integrity; the structure stays the same.

Task 1: Customer reply with constraints

Task 1: Customer reply with constraints

Given an inbound customer complaint with mixed factual and emotional content, the candidate uses an AI assistant to draft a reply. Constraints: must address the complaint fully, must not commit to refunds beyond policy, must maintain professional tone, must complete in under 8 minutes.


Scoring captures: prompt quality, output quality, constraint adherence, error detection (was the AI's first draft factually correct?), final-output suitability.

Given an inbound customer complaint with mixed factual and emotional content, the candidate uses an AI assistant to draft a reply. Constraints: must address the complaint fully, must not commit to refunds beyond policy, must maintain professional tone, must complete in under 8 minutes.


Scoring captures: prompt quality, output quality, constraint adherence, error detection (was the AI's first draft factually correct?), final-output suitability.

Given an inbound customer complaint with mixed factual and emotional content, the candidate uses an AI assistant to draft a reply. Constraints: must address the complaint fully, must not commit to refunds beyond policy, must maintain professional tone, must complete in under 8 minutes.


Scoring captures: prompt quality, output quality, constraint adherence, error detection (was the AI's first draft factually correct?), final-output suitability.

Task 2: Noisy data extraction

Task 2: Noisy data extraction

Given a messy dataset (CSV with formatting errors, missing values, ambiguous labels), the candidate uses an AI tool to extract structured insights. Constraints: must identify at least three places where the AI output is likely wrong; must rank the reliability of each insight; must complete in under 10 minutes.


Scoring captures: tool-use efficiency, accuracy of error identification, ranking quality, judgement about what the AI got right vs. wrong.

Given a messy dataset (CSV with formatting errors, missing values, ambiguous labels), the candidate uses an AI tool to extract structured insights. Constraints: must identify at least three places where the AI output is likely wrong; must rank the reliability of each insight; must complete in under 10 minutes.


Scoring captures: tool-use efficiency, accuracy of error identification, ranking quality, judgement about what the AI got right vs. wrong.

Given a messy dataset (CSV with formatting errors, missing values, ambiguous labels), the candidate uses an AI tool to extract structured insights. Constraints: must identify at least three places where the AI output is likely wrong; must rank the reliability of each insight; must complete in under 10 minutes.


Scoring captures: tool-use efficiency, accuracy of error identification, ranking quality, judgement about what the AI got right vs. wrong.

Task 3: Ethical-use review

Task 3: Ethical-use review

Given an AI-generated business plan that contains a subtle ethical issue (e.g., a marketing approach that crosses GDPR consent boundaries), the candidate must identify the issue and rewrite the section. Constraints: must explain the specific ethical concern; must propose a defensible alternative; must complete in under 12 minutes.


Scoring captures: ethical-issue detection, depth of reasoning, quality of the alternative, defensibility of the proposed solution.

Given an AI-generated business plan that contains a subtle ethical issue (e.g., a marketing approach that crosses GDPR consent boundaries), the candidate must identify the issue and rewrite the section. Constraints: must explain the specific ethical concern; must propose a defensible alternative; must complete in under 12 minutes.


Scoring captures: ethical-issue detection, depth of reasoning, quality of the alternative, defensibility of the proposed solution.

Given an AI-generated business plan that contains a subtle ethical issue (e.g., a marketing approach that crosses GDPR consent boundaries), the candidate must identify the issue and rewrite the section. Constraints: must explain the specific ethical concern; must propose a defensible alternative; must complete in under 12 minutes.


Scoring captures: ethical-issue detection, depth of reasoning, quality of the alternative, defensibility of the proposed solution.

AI fluency vs AI proficiency vs AI competency

Three terms that overlap heavily. The disambiguation:

AI competency

the framework

Structured model of skills, knowledge, and behaviours. Use this term when designing an L&D programme or capability map.

AI proficiency

the practical performance

What a person can do with AI in their actual work. Use this term in hiring decisions.

AI fluency

the natural-extension quality

How smoothly the person operates AI as part of how they work. Use this term for senior roles and leadership pipeline.

Bryq's framework measures the same five dimensions for all three. The output framing changes; the measurement does not. For the full mapping see the disambiguation guide.

Use cases for hiring and for your current team

For  For hiring: senior and leadership-pipeline roles

Fluency is the right term when the question is depth, not basic competence. Senior individual contributors and managers who lead AI-augmented teams need to operate AI as an extension of how they work. The assessment differentiates fluent candidates from competent ones at the offer stage.

For hiring: AI-intensive technical roles

Engineers, ML specialists, and AI-product roles benefit from performance-based fluency measurement that goes beyond technical interviews. The framework captures the ethical, evaluation, and workflow-integration dimensions that pure technical interviews often miss.

For your current team: internal capability differentiation

Where a workforce has uneven AI capability and you need to identify the high-fluency individuals (for mentoring, internal advocacy, or programme design) the performance-based assessment surfaces who is genuinely fluent versus who self-reports fluency they do not have. The same assessment that screens candidates baselines current employees with role-relative scoring.

For your current team: re-baselining as tools change

AI tooling evolves quickly. The fluency that mattered in mid-2025 is not the fluency that matters in 2027. Run the assessment annually across the workforce to track real capability change over time. The longitudinal data is far more useful than year-on-year self-report surveys, which mostly track confidence inflation.

Customer evidence

Customers use Bryq for technical-role hiring where AI fluency matters in real time. Others, AI-native customers, run it to hire genuinely AI-fluent talent. Across the 140+ teams using Bryq globally: 3x improvement in quality of hire, 47% lower attrition, 2x faster hiring.


Results measured across Bryq customer engagements. Individual outcomes vary by role, industry, and baseline hiring maturity.

Bryq measures AI fluency the way it should be measured

Through the work, not through the self-assessment. The data matches the reality. The hiring decisions improve. The internal capability baseline becomes something you can act on.

Ready to Measure AI Proficiency?

Book a 30-minute demo. We’ll build your first AI Proficiency profile on the call, for a role you're hiring or a team you want to assess.

Ready to Measure AI Proficiency?

Book a 30-minute demo. We’ll build your first AI Proficiency profile on the call, for a role you're hiring or a team you want to assess.

Ready to Measure AI Proficiency?

Start hiring based on

real data.

FAQ

Find answers to the most frequently asked questions

The terms overlap significantly. Fluency emphasises ease and naturalness, operating AI as a comfortable extension of one's work, the way a fluent speaker operates a language. Proficiency emphasises practical workplace performance. Both describe the same underlying capability; Bryq measures both with the same five-dimension framework. The choice of term is mostly about audience and emphasis.
Self-rated skill correlates weakly with measured skill, especially in novel domains. People with less skill often overestimate their ability; people with more skill often underestimate. The pattern (often called Dunning–Kruger) is well-documented. For hiring decisions, self-report introduces unacceptable variance and bias. Performance-based measurement eliminates both.
Each task gives the candidate a realistic work scenario, a set of AI tools, and clear constraints. The candidate performs the task and submits the output. Examples include drafting a customer reply using an AI assistant under specified constraints, extracting structured insight from a noisy dataset using an AI tool and flagging likely errors, and reviewing an AI-generated document for ethical-use violations. Each task takes 8–12 minutes.
Yes, and the assessment data shows you exactly where to focus practice. Dimension-level scoring reveals whether a candidate is weak on prompting, on critical evaluation, on ethical use, or elsewhere. The same data also serves as a baseline for measuring improvement after development.
Around 15 minutes per person, on average. The format is scenario-based and tool-agnostic, designed to measure practical decision-making rather than knowledge of terminology.
No. Bryq measures fluency for AI-augmented work in general. The five dimensions hold across tools. The candidate or employee uses tools available in the assessment environment; the scoring focuses on practical decision-making and output quality, not on knowledge of any single product.
Yes. The same assessment runs for hiring, internal capability baselining, leadership-pipeline evaluation, and re-baselining as your AI tooling evolves. Role-relative scoring means an engineer, a marketer, and a customer-success rep are each scored against what fluency means in their specific role. Outputs feed into L&D, succession planning, and internal mobility.
Yes. Bryq publishes the product under the name AI Proficiency Assessment. "AI fluency assessment" is one of the terms buyers search when looking for the same underlying measurement. See our disambiguation guide for the full mapping.
140+ teams globally, with named case studies including Roadrunner, Granicus, Trendsetter Homes, Metro Pacific Tollways, Global BPO, Hawkeye Innovations, AccountingProse, and Persado.