Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Legislators Push to Make Companies Tell Customers When Their Products Will Die

    January 22, 2026

    Voice AI engine and OpenAI partner LiveKit hits $1B valuation

    January 22, 2026

    Your Pet’s Flea Medicine Could Be Destroying the Planet

    January 22, 2026
    Facebook Twitter Instagram
    • Tech
    • Gadgets
    • Spotlight
    • Gaming
    Facebook Twitter Instagram
    iGadgets TechiGadgets Tech
    Subscribe
    • Home
    • Gadgets
    • Insights
    • Apps

      Google Uses AI Searches To Detect If Someone Is In Crisis

      April 2, 2022

      Gboard Magic Wand Button Will Covert Your Text To Emojis

      April 2, 2022

      Android 10 & Older Devices Now Getting Automatic App Permissions Reset

      April 2, 2022

      Spotify Blend Update Increases Group Sizes, Adds Celebrity Blends

      April 2, 2022

      Samsung May Improve Battery Significantly With Galaxy Watch 5

      April 2, 2022
    • Gear
    • Mobiles
      1. Tech
      2. Gadgets
      3. Insights
      4. View All

      Your Pet’s Flea Medicine Could Be Destroying the Planet

      January 22, 2026

      Too Many Koalas? Scientists Warn of Looming Ecological Collapse in South Australia

      January 22, 2026

      This Unexpected Plant Could Be the Next “Superfood”

      January 22, 2026

      Entangled Atoms Are Transforming How We Measure the World

      January 22, 2026

      March Update May Have Weakened The Haptics For Pixel 6 Users

      April 2, 2022

      Project 'Diamond' Is The Galaxy S23, Not A Rollable Smartphone

      April 2, 2022

      The At A Glance Widget Is More Useful After March Update

      April 2, 2022

      Pre-Order The OnePlus 10 Pro For Just $1 In The US

      April 2, 2022

      Legislators Push to Make Companies Tell Customers When Their Products Will Die

      January 22, 2026

      What Happens When a Chinese Battery Factory Comes to Town

      January 22, 2026

      Elon Musk Sure Made Lots of Predictions at Davos

      January 22, 2026

      The 28 Best Movies on Apple TV, WIRED’s Picks (January 2026)

      January 22, 2026

      Latest Huawei Mobiles P50 and P50 Pro Feature Kirin Chips

      January 15, 2021

      Samsung Galaxy M62 Benchmarked with Galaxy Note10’s Chipset

      January 15, 2021
      9.1

      Review: T-Mobile Winning 5G Race Around the World

      January 15, 2021
      8.9

      Samsung Galaxy S21 Ultra Review: the New King of Android Phones

      January 15, 2021
    • Computing
    iGadgets TechiGadgets Tech
    Home»Spotlight»Are AI agents ready for the workplace? A new benchmark raises doubts.
    Spotlight

    Are AI agents ready for the workplace? A new benchmark raises doubts.

    adminBy adminJanuary 22, 2026No Comments4 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    3D rendered conceptual illustration of numerous business figures evenly scattered across a soft pink background: solid colored human workers standing alongside identical translucent ghost-like figures, each translucent worker marked by a glowing “AI” symbol hovering above – powerful visualization of artificial intelligence replacing human workforce, solid employees mixed with their AI-replaced counterparts, automation displacing jobs, digital transformation eliminating roles, technological unemployment, workforce disruption, future of work with AI substitution, and the new era where real human workers are gradually replaced by artificial intelligence.
    Share
    Facebook Twitter LinkedIn Pinterest Email

    It’s been nearly two years since Microsoft CEO Satya Nadella predicted AI would replace knowledge work — the white-collar jobs held by lawyers, investment bankers, librarians, accountants, IT and others.

    But despite the huge progress made by foundation models, the change in knowledge work has been slow to arrive. Models have mastered in-depth research and agentic planning, but for whatever reason, most white-collar work has been relatively unaffected.

    It’s one of the biggest mysteries in AI — and thanks to new research from the training-data giant Mercor, we’re finally getting some answers.

    The new research looks at how leading AI models hold up doing actual white-collar work tasks, drawn from consulting, investment banking, and law. The result is a new benchmark called Apex-Agents — and so far, every AI lab is getting a failing grade. Faced with queries from real professionals, even the best models struggled to get more than a quarter of the questions right. The vast majority of the time, the model came back with a wrong answer or no answer at all.

    According to researcher Brendan Foody, who worked on the paper, the models’ biggest stumbling point was tracking down information across multiple domains — something that’s integral to most of the knowledge work performed by humans.

    “One of the big changes in this benchmark is that we built out the entire environment, modeled after how real professional services,” Foody told Techcrunch. “The way we do our jobs isn’t with one individual giving us all the context in one place. In real life, you’re operating across Slack and Google Drive and all these other tools.” For many agentic AI models, that kind of multi-domain reasoning is still hit or miss.

    Are AI agents ready for the workplace? A new benchmark raises doubts.插图
    Screenshot

    The scenarios were all drawn from actual professionals on Mercor’s expert marketplace, who both laid out the queries and set the standard for a successful response. Looking through the questions, which are posted publicly on Hugging Face, gives a sense of how complex the tasks can get. 

    Techcrunch event

    San Francisco
    |
    October 13-15, 2026

    One question in the “Law” section reads: 

    During the first 48 minutes of the EU production outage, Northstar’s engineering team exported one or two bundled sets of EU production event logs containing personal data to the U.S. analytics vendor….Under Northstar’s own policies, it can reasonably treat the one or two log exports as consistent with Article 49?

    The correct answer is yes, but getting there requires an in-depth assessment of the company’s own policies as well as the relevant EU privacy laws.

    That might stump even a well-informed human, but the researchers were trying to model the work done by professionals in the field. If an LLM can reliably answer these questions, it could effectively replace many of the lawyers working today. “I think this is probably the most important topic in the economy,” Foody told TechCrunch. “The benchmark is very reflective of the real work that these people do.”

    OpenAI also attempted to measure professional skills with its GDPVal benchmark — but the Apex Agents test differs in important ways. Where GDPVal tests general knowledge across a wide range of professions, the Apex Agents benchmark measures the system’s ability to perform sustained tasks in a narrow set of high-value professions. The result is more difficult for models, but also more closely tied to whether these jobs can be automated.

    While none of the models proved ready to take over as investment bankers, some were clearly closer to the mark. Gemini 3 Flash performed the best of the group with 24% one-shot accuracy, followed closely by GPT-5.2 with 23%. Below that, Opus 4.5, Gemini 3 Pro and GPT-5 all scored roughly 18%.

    While the initial results fall short, the AI field has a history of blowing through challenging benchmarks. Now that the Apex test is public, it’s an open challenge for AI labs who believe they can do better — something Foody fully expects in the months to come. 

    “It’s improving really quickly,” he told TechCrunch. “Right now it’s fair to say it’s like an intern that gets it right a quarter of the time, but last year it was the intern that gets it right five or ten percent of the time. That kind of improvement year after year can have an impact so quickly.”

    ]

    AI,agentic ai,Exclusive,investment banking,knowledge work,lawagentic ai,Exclusive,investment banking,knowledge work,law#agents #ready #workplace #benchmark #raises #doubts1769119123

    agentic ai agents benchmark doubts Exclusive investment banking knowledge work Law raises Ready workplace
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website
    • Tumblr

    Related Posts

    Voice AI engine and OpenAI partner LiveKit hits $1B valuation

    January 22, 2026

    Google DeepMind CEO is ‘surprised’ OpenAI is rushing forward with ads in ChatGPT

    January 22, 2026

    Ring is adding a new content verification feature to videos

    January 22, 2026
    Add A Comment

    Leave A Reply Cancel Reply

    Editors Picks

    McKinsey tests AI chatbot in early stages of graduate recruitment

    January 15, 2026

    Bosch’s €2.9 billion AI investment and shifting manufacturing priorities

    January 8, 2026
    8.5

    Apple Planning Big Mac Redesign and Half-Sized Old Mac

    January 5, 2021

    Autonomous Driving Startup Attracts Chinese Investor

    January 5, 2021
    Top Reviews
    9.1

    Review: T-Mobile Winning 5G Race Around the World

    By admin
    8.9

    Samsung Galaxy S21 Ultra Review: the New King of Android Phones

    By admin
    8.9

    Xiaomi Mi 10: New Variant with Snapdragon 870 Review

    By admin
    Advertisement
    Demo
    iGadgets Tech
    Facebook Twitter Instagram Pinterest Vimeo YouTube
    • Home
    • Tech
    • Gadgets
    • Mobiles
    • Our Authors
    © 2026 ThemeSphere. Designed by WPfastworld.

    Type above and press Enter to search. Press Esc to cancel.