OECD’s AI Benchmark is the Message
What the OECD's AI Capability Indicators Mean for Business Leaders
By Tommy Cooke, powered by coffee and Blue Jays baseball
Jun 27, 2025

Key Points:
The OECD’s AI Capability Indicators mark a critical shift from AI hype to measurable, human-centered benchmarking
Even today’s most advanced AI systems perform unevenly, often failing to match basic human adaptability, judgment, and emotional nuance
For business leaders, these benchmarks are not just technical assessments — they’re a practical guide to knowing when to automate and when to elevate human talent
A quiet evolution is underway. After years of loud debates fraught with both hype and doubt surrounding AI's risks, promises, and potential, the Organisation for Economic Co-operation and Development (OECD) has done something refreshingly uncontroversial: it is measuring AI.
This is not measuring in the abstract. It is happening through a concrete framework that evaluates how even the most advanced AI systems stack up against human capabilities.
The release of the OECD AI Capability Indicators earlier this month offers business leaders something valuable: a way to separate performance from perception.
Moreover, business leaders ought to pay attention because the OECD’s efforts have established a benchmark precedent that encourages people to really focus on the differences between AI-generated and human-generated outputs; AI is learning, but so too are humans.
From Possibility to Performance: OECD’s AI Capability Indicators at a Glance
For years, AI discourse has been dominated by extremes. From utopian visions of self-aware machines to dystopian warnings of job-stealing automation, humans have been pulled back and forth for a long time now. The trouble here is that between the excitement and fear, there is little space to comfortably find and occupy a middle ground.
That middle ground is important. Why? It’s a space where we can more pragmatically and objectively reflect upon what AI is currently capable of.
The OECD’s new framework benchmark is important scaffolding for that middle ground. What they have done is identify nine human-relevant capabilities including language, vision, reasoning, creativity, and social interaction. Using a scale from zero to five, the goal of the benchmark is to measure and compare how well AI systems perform like humans.
What makes the framework so significant isn’t just the scoring—it’s the mindset shift. For the first time, governments, researchers, and businesses are using a shared scale to evaluate AI performance not against hype, but against actual, measurable human capability.
This benchmark is significant because it reinforces a growing recognition among humans that intelligence isn’t binary. It cannot be reduced to linear algebra. It’s gradual, contextual, and measurable. AI can be scored in terms of its ability to not just talk like a human, but whether it sees, manipulates, critically reflects, memorizes, problem solves, and learns like a human.
The Verdict: What OECD’s AI Capability Indicators Reveal about The State of AI
So, how is AI performing against these indicators? The verdict is not encouraging. Even state-of-the-art systems like GPT-4 and Claude fall between Level 2 and Level 3. This means that they exhibit some general human skills. However, they do not do so consistently, nor are they robust and adaptive in their ability to perform like humans.
This is a sobering truth. AI is strong but against this benchmark, its strength is not very compelling. Let’s have a look at a few key indicators and see how extant AI is doing:
Language. At level 3, AI generates coherent summaries, it translates text, and it mimics tone and even emotions like empathy. But, it still hallucinates. It misses nuance, and it consistently struggles with factual accuracy
Problem-solving and Knowledge Retention. These capabilities are important. Can AI find solutions and retain them consistently? This is important for structured tasks like drafting reports, generating legal summaries, or even conducting market analyses. AI performs moderately here, around level 3. Why? It still struggles to creatively synthesize. It also has a hard time judging criteria in the way that humans do
Visual Understanding. Image recognition and labeling are reasonably advanced at level 3. However, it struggles to interpret diagrams with text (also called multi-modal coordination. Most AI that generates images also fail to consistently learn from images that users provide them
Social and Emotional Capability. This is perhaps where AI lags behind the most. Averaging level 1 or 2 means that its efforts merely emulate social and emotion intelligence. AI mimics politeness, but it cannot understand or respond to real human emotion. Empathy is still beyond the machine, contrary to what some engineers believe
The OECD’s AI Benchmark Is the Message
The most important thing that the OECD has done here isn’t simply rating AI. It’s changing how we talk about it.
For far too long, AI has been treated as both mysterious and magical. It’s been treated as a black box that fires magic bullets. By establishing shared indicators, the OECD has invited the world to measure AI’s strengths and weaknesses like any other technology. It is, in effect, demystifying AI for us.
As importantly, the benchmark reminds us of something that is far too easy to forget: human standards matter. These indicators don’t merely reveal what AI can do, but also it reveals to us what AI cannot yet do.
This is more than semantics, especially if you are a business leader. When reflecting upon the OECD’s benchmark, I encourage you to start exploring and asking how your own AI performs. At the very least, it should reveal to you not only where your people are important, but also why they are so important. Consider doing the following:
First, map your AI efforts. What are your core business processes, and which of them perform at level 1 to 2? They are likely repetitive, structured, and predictable. These are the tasks that are prime for automation. Think customer service scripts. Monthly reporting. Content generation. Invoice verification, and so on
Second, identify your human talent. This is your competitive edge, after all. Where are you relying on social insight, flexible judgment, ethical nuance, and cultural awareness? These are not just hard to automate, but also where your people offer the most value
Third, design integration with purpose. Don’t just deploy AI because your competitors are doing it. Deploy it because you understand where it fits, and it should fit your people and your organization like a glove—not like a raincoat
Lastly, build a feedback loop. The OECD indicators will evolve. Your business should too. Treat the OECD’s indicators as a maturity benchmark that are living metrics. They will change and maturate just like your human talent. Revisit them often and use them to evaluate vendors, assess risks, and communicate clearly with stakeholders
There’s a quiet elegance to what the OECD has done. In a world obsessed with what’s next, they’ve grounded us in what’s now.