Transcript
Follow every word Load in player Um, so imagine hiring a brilliant mathematician, right? Like someone who can write flawless software code in seconds, but then they completely fail a basic reading comprehension test. Yeah, it sounds absurd, but that is actually exactly what the modern workforce is doing right now with artificial intelligence. Exactly. I mean, the prevailing anxiety for anyone navigated the current economy is really just one big question. Will AI take my job or just change it completely? Right. And we hear endless speculation about that. But um, very little of it is actually grounded in how these models function on a granular level. Yeah, we hear a lot of noise, but not a lot of hard numbers. So today, for this deep dive, we've got our hands on a brand new study. It's from April 2026 out of savagery by Phil Poon University. Right. It's called the AI skillshift. Yes. And the whole mission here is to map the hard empirical data directly to your career because the timing of this data, I mean, it couldn't be more critical. Oh, absolutely, especially since the people actually running the global economy cannot seem to agree on what is happening. Like we are seeing this very public, highly fractured debate among industry leaders. The consensus is totally non -existent right now. I mean, there are three really prominent viewpoints dominating finance and tech. Right. Starting with the really aggressive side. Yeah. You have JPMorgan's Jamie Diamond. He was at the Hillin Valley Forum back in March 2026. And he basically said, AI displacement is an immediate structural shift. Like it's happening today and we need massive retraining efforts right now. But then you have David Solomon at Goldman Sachs who is in a much more moderate camp. Exactly. He explicitly said he isn't predicting some sort of job apocalypse. He thinks the economy is nimble enough to absorb the tech. Though he does admit the short term disruption is going to be pretty turbulent. Right. And then dumping to the furthest end of the spectrum, you have Dario Amade. The CEO of Anthropic. Right. And his warning was just stark. He said up to 50 % of entry -level white collar jobs could just vanish within the next five years, which is a massive number. So you literally have the top minds and finance and tech entirely split. I mean, it's either a complete hollowing out of the workforce, a manageable transition or a retraining emergency. Yeah. But here's the thing. Diamond and Solomon, they're debating from this, like 10 ,000 foot macro view. Exactly. And what's fascinating here is that despite these massive CEO predictions, very few people are actually testing AI against the standardized skills that make up real jobs. Because an occupation isn't just one big monolithic thing, right? It's a bundle of really specific tasks. Exactly. And the researchers at Poon University realized that. So instead of asking, can AI do this job? They benchmarked four frontier, large language models against the 35 specific skills defined by the US Department of Labor's Onet Taxonomy. They basically gave the AI a standardized test for every core component of the American workforce. Right. And they codified all this into what they call the skill automation feasibility index or the SFA score. Okay. Let's unpack this because the setup for this test was pretty intense, right? Very intense. The methodology is incredibly robust. They threw 263 distinct text -based tasks at four massive frontier models. And these are the big ones. Alama May 3 .370B, Mistral Large, Quinn 2 .572B, and Gemini 2 .5 Flash. Yep. Across all those tasks and models, the researchers made 1 ,052 total model calls. And what immediately stands out just from a reliability standpoint is the 100 % completion rate. Wait, 100 % like no crashes at all. Zero system failures, zero timeouts. The models executed every single command they were given. Wow. Okay, but executing a command and actually doing it well are two different things. Oh, completely. The quality of execution varied wildly depending on the skill. Right. So the SFA scores are on a zero to 100 scale. And when the results came in mathematics and programming just absolutely dominated. Yeah, mathematics scored a 73 .2 and programming came in at 71 .8. But then you look at the absolute bottom of the index. Active listening score to 42 .2 and Reading Comprehension said it just 45 .5. Which, you know, sounds a bit crazy at first. It is crazy. I find it so counterintuitive. I mean, we are talking about large language models. Their entire architecture is literally built on processing text. Right. So how does a text -based machine fail at reading comprehension? It's like finding out a calculator is a genius at calculus, but can't figure out a restaurant tip. Why is it failing at the social communicative stuff? Well, it really comes down to the fundamental difference between structured quantitative reasoning and nuanced human communication. You break that down for me. So LLMs at their core are just incredibly sophisticated next token prediction engines. They calculate the probabilistic sequence of words based on their training data. Right. They're guessing what comes next. Exactly. So they excel at math and coding because those are bounded deterministic systems. They have strict rules. If a mathematical equation balances, it balances. Yeah, programming syntax is absolute. There's no gray area. Right. But active listening and reading comprehension, those require an internal state of mind that the model just simply does not possess. Oh, I see. So using an AI for active listening is kind of like using a TPS to determine if your passengers enjoying the road trip. That is a perfect analogy. Yeah. Like the GPS can perfectly map the route, the structure data, but it has absolutely no sensors for the human experience happening inside the car. Exactly. Because active listening in a real professional environment, it involves inferring unspoken subtext. It's interpreting social dynamics and emotional shifts. And the model can't read between the lines because it has no lived experience to draw any context from. That is an incredibly precise way to look at it. It can process the syntax of the conversation, but it totally misses the intent. Wow. And this mechanistic limitation actually leads directly to another huge finding in the study, which they call model convergence. Right. Because they tested models from all over the place. Yeah. And entirely different architectures, different training data, different countries, Lama and Gemini are from the US, Mistral's from France, and Quinn is from China. And yet didn't they all score basically the same? Pretty much. All four models scored within a remarkably tight 3 .6 point spread across the board. Mistral large led slightly with a 60 .0 average, but their actual capability profiles were virtually identical. So for anyone listening right now who has to make workforce or software purchasing decisions, this is huge. It really is. Because it means text -based automation depends entirely on the inherent nature of the task, not of which specific vendor you buy from. Exactly. You don't need to completely overhaul your workflow every time a new model drops. Right. Because at that 70 billion parameter frontier, which, just as a reminder, is a measure of the model's neural network size, they're all hitting the exact same ceiling when it comes to unstructured human nuance. The models are standardizing, but the real insight comes when we look at how those test scores translate to actual offices and factories right now. Yeah, because the researchers cross referenced their safe eye scores with the anthropic economic index, right? They did. And that index tracked real -world AI adoption across 756 different occupations. And here's where it gets really interesting. Because when I was reading through that anthropic data, the paradox was just glaring. The capability, demand, and version. Yes. The paradox is that the occupations with the highest AI exposure scores. So the jobs where workers are using constantly throughout the day, they are heavily concentrated in the exact same skills the AI scored the absolute lowest on. It's wild to see it in the data. Right. Reading comprehension, writing, and active listening are the dominant skills in these highly AI exposed jobs. Wait, if AI is so terrible at active listening and reading comprehension, why are the jobs that require those skills the ones using AI the most? It seems like a contradiction, right? Totally. Isn't that like hiring an intern who can't read to run your book club? Well, if we connect this to the bigger picture, it actually makes perfect sense when you break down the workflow. Let's look at customer service representatives. They have a massive AI exposure score of .701. And market research analysts are up there too at .648. Exactly. And the core value of those roles relies heavily on the exact nuanced human communication we just said AI is terrible at. A customer service rep has to deescalate an angry client. Yeah. And a market research analyst has to figure out human buying psychology. Right. But the key here is that the AI isn't doing the deescalation. The AI is a productivity tool handling the structured data retrieval that used to slow the human down. Oh, I see. So while the customer is yelling on the phone, the AI is instantly pulling up their purchase history, cross referencing warranties, and like formatting a refund table. Exactly. It does the heavy lifting on the structured deterministic side. That frees up the human's cognitive bandwidth to actually practice active listening and manage the emotional side of the call. Okay. That makes so much sense. So the human and the AI are dividing the labor based on what their architectures are actually good at. The AI does the syntax and the human does the sentiment. Precisely. And to see if this holds up at scale, the researchers analyze 3 ,364 specific real -world task interactions. And what did they find? Is it mostly automation or are humans still involved? The data strongly refutes the narrative of total immediate automation, nearly 80%, 78 .7 % to be exact, of AI interactions or what they call augmentation. Wow. So only 21 .3 % are true automation. Yep. The vast majority are collaborative feedback loops. The human asks for a draft. The AI generates it. The human corrects the tone. The AI revises it. It's interactive. Completely. It is rarely a directive end -to -end task completion where you just push a button and walk away. Okay. That brings us right back to David Solomon's example at Goldman Sachs. The IPO prospectus example. Yes. Right. So we pointed out that AI can now draft 95 % of an initial public offering prospectus, the S1 filing in just minutes. Which is incredible because historically that document required a six -person team working grueling hours for two solid weeks. Yeah. But he noted that the last 5 % is the human judgment. And he says that is now the most critical, well, wait, let me push back on this a bit. Yeah. If an AI is doing 95 % of the drafting that a six -person team used to do, aren't 95 % of those six people out of a job? Is this augmentation just a fancy corporate word for your fired, but your boss has a new toy? That is the natural instinct, but it misses accrucial nuance about white collar work. Drafting the text of an S1 filing is only a fraction of the actual job. Because of the liability, right? Exactly. An IPO prospectus carries massive legal liability. An AI can aggregate the financial tables and spit out boilerplate risk factors, but an AI cannot absorb legal risks. Right. Because of the SEC audits the filing, you can't exactly send the language model to jail. Exactly. If the framing misleads shareholders, the human executives go to federal prison. So, for those entry -level workers, Dario Mode was so worried about, the shift isn't necessarily from employment to unemployment. It's a shift in what they actually do all day. Yes. It moves from routine creation to verification and liability absorption. You cannot submit an AI -generated S1 without exhaustive human auditing. So, the workers who thrive aren't the ones manually typing tables the fastest anymore. No. The ones who thrive are the ones who learn to direct, evaluate, and refine the AI outputs. They spot the hallucinations and align the document with legal boundaries. So, the job stays, but the required skill set shifts drastically. And to help workers figure out exactly where they stand in all this, the researchers created this really useful tool called the AI Impact Matrix. Right. It's a four quadrant map. It plots a skills AI capability, the safe -eye score against its real world AI exposure from that anthropic data. Let's walk everyone through this so they can figure out where they sit. What's quadrant one? Watching eye is high displacement risk. These are roles with high AI capability and high real world exposure. Like junior programming. Exactly. The AI scored a 71 .8 in programming and tech industry exposure is massive. So, for developers in this quadrant, just knowing basic syntax is functionally obsolete. Because the AI is already a master of syntax. Right. To survive quadrant one, developers have to level up to system architecture and cross -functional design. They need to understand business logic, not just code compilation. Okay. What about quadrant two? That's the AI augmented zone. This is our capability demand inversion area. So, high real world exposure, but low inherent AI capability. Exactly. This is your financial analysts, educators, customer service roles. The core value relies on complex judgment and empathy. So, the AI acts as a high powered co -pilot for the data, but it can't replicate the human to human transactions. Spot on. Then we move to quadrant three, which is the upskilling window. This one is fascinating because it totally flips the conventional wisdom around blue collar work. It really does. This quadrant features moderate to high AI capability on the test, but currently has very low real world exposure. We're talking about physical technical traits, right? HVAC technicians, manufacturing operators, field mechanics. But, wait, the study said AI only does text -based tasks. Right. How can AI have high capability for an HVAC technician? Is the AI going to text my broken air conditioner? That's the perfect question and it's a great nuance. Obviously, a language model isn't physically turning a wrench, but the high capability score proves that these physical traits possess a massive layer of cognitive and diagnostic work. Oh, I see. So, the physical execution and the cognitive diagnosis are being decoupled. Exactly. If a mechanic feeds sensor data, pressure readings, and audio diagnostics into a frontier model, the AI can cross -reference that against every service manual ever -published and give you the exact repair steps instantly. So, right now, just the sheer physical presence required to access the machinery is what insulates these workers. They have negative AI exposure. Currently, yes. But as diagnostic sensor platforms bridge the gap between the physical world and the digital network, that insulation will evaporate. Which is why they call it the upskilling window. There is a narrow time frame right now for physical traits people to adapt before the transition accelerates. Exactly. The mechanic who learns to integrate AI diagnostics into their workflow will multiply their value massively. Makes sense. Okay. Finally, quadrant fourth. This is lower displacement risk. Right. Low AI capability and low real -world exposure. We are talking about jobs requiring deeply embodied human judgment and high stakes, physical or emotional intervention. Specialized surgeons, physical therapists, trial lawyers in a courtroom. Exactly. The value here is intrinsically tied to human -to -human stakes. A trial lawyer isn't just reading case law. They're reading the jury. Right. They're feeling the emotional temperature of the room and LLM's lack physical embodiment and emotional state so they just can't execute these roles. And the workforce isn't even attempting to expose these roles to automation. Okay. So what does this all mean? When we look at the safe eye testing, the anthropic data and this matrix, what is the ultimate takeaway for you, the listener? The main takeaway is that the binary narrative of will AI take my job is just too simple. AI is an exceptional engine for rulebound reasoning. It will consume structured data and syntax. But it is measurably weak at the connective tissue of the modern economy. Exactly. Listening, reading, speaking, social perception, those maintain their premium. In almost every high value workflow, you the human are responsible for the last mile of judgment. And remember, this data represents a snapshot of early 2026 at the 70 billion parameter frontier. The window to proactively prepare is open right now. It is. You don't need to fear the AI. You need to learn to manage it. Invest in human AI workflow design and critical evaluation. And you know, this raises an important question that the researchers brought up at the very end of the study. Oh, about the government definitions. Yeah, we've been talking about the own at taxonomy of 35 skills, but that list was created before LLMs even existed. So the framework we're using to measure the workforce is fundamentally outdated. Exactly. The researchers point out that entirely new skills are emerging right now. Skills like multi -agent orchestration, which is like setting up multiple AI models with different roles to manage a complex project without a human handholding every step. Right. Or prompt engineering the precise linguistic framing to get the best reasoning out of a model. These are vital skills for the augmented workflow, but they aren't even on the official labor list yet. They literally don't have an official name. And that is the ultimate challenge for the listener right now. If the most vital career skill you'll need in five years doesn't even have a name today. How will you start practicing it tomorrow? You can't wait for a corporate training program. You have to actively build that intuition yourself. Exactly. Think about that mathematician who can write code but can't read a room. The machine provides the calculation. You provide the comprehension. Look at your own daily workflow and ask yourself where does the calculation end and where does your last mile of human judgment begin?