AI Performance Breakthroughs Signal Approaching Human-Level General Capabilities
OpenAI's GPT-5.4 achieved 75% on OSWorld-Verified (surpassing human performance), while Google's Gemini 3.1 Ultra scored 94.3% on GPQA Diamond. These benchmarks represent significant leaps in real-world task completion and scientific reasoning capabilities within a single development cycle.
These performance jumps suggest AI systems are rapidly approaching human-level capabilities across diverse domains, accelerating timeline expectations for AGI deployment.
agi
benchmarks
gpt-5
gemini
performance