AI Exceeds Human Performance in 82% of Complex Tasks, Signaling New Era of Work

Wharton Professor Ethan Mollick, a leading AI researcher, asserts that artificial intelligence has entered a transformative new phase, shifting from human-AI "co-intelligence" to an era of "managing AIs." This transition is fueled by exponential improvements in AI capabilities, with the best systems now matching or surpassing human performance in complex tasks 82% of the time, according to the GDPval benchmark. Mollick's analysis, published on his "One Useful Thing" Substack, highlights a future where AI agents autonomously perform extensive human work.

The rapid advancement of AI is evident across various domains, from sophisticated image and video generation, as demonstrated by Mollick's "Otter Test," to high scores on academic and problem-solving benchmarks. For instance, the best AIs achieve 94% on the Google-Proof Q&A benchmark, a test where graduate students using Google score significantly lower. Despite these impressive gains, Mollick notes that AI remains "jagged," excelling in some areas while still struggling in others.

Organizations are already experimenting with radical new operational models to leverage these capabilities. StrongDM, a security software company, introduced a "Software Factory" where AI agents autonomously write, test, and ship production software. This factory operates under the principle that "Code must not be written by humans" and "Code must not be reviewed by humans," with engineers allocating significant budgets to AI tokens.

This accelerating AI capability is creating a "rolling and unpredictable environment" for markets, jobs, and government. Mollick points to a single week in February as illustrative: a fictional scenario from Citrini Research caused stock market shifts, financial services company Block announced 40% layoffs (with AI implied as a factor), and a public conflict arose between the Pentagon and AI firm Anthropic over AI usage rules.

Looking ahead, Mollick emphasizes the concept of Recursive Self-Improvement (RSI), where AI systems are increasingly used to build better AI systems, potentially accelerating the current exponential growth. Leaders from companies like Anthropic and OpenAI acknowledge this as a key development, with engineers within these firms already relying on AI to write much of their code.

While this future presents significant instability, Mollick argues that it also offers a critical window for individuals and organizations to influence AI's trajectory. The choices made now in how AI is integrated into work, education, and governance will set precedents for its broader societal impact.