Scoopfeeds — Intelligent news, curated.
Why Weibo’s tiny VibeThinker-3B has the AI world arguing over benchmarks again
ai

Why Weibo’s tiny VibeThinker-3B has the AI world arguing over benchmarks again

VentureBeat AI · Jun 17, 2026, 12:32 AM

Why this matters: a development in AI with implications for how people work, create, and decide.

On Sunday, a team of nine researchers at Sina Weibo — the Chinese social media giant better known for its microblogging platform than for cutting-edge artificial intelligence — quietly posted a 14-page technical report to ar Xiv that sent shockwaves through the AI research community. Their claim: a language model with just 3 billion parameters can match or exceed the reasoning performance of flagship systems from Google Deep Mind, Open AI, Anthropic, and Deep Seek that are hundreds of times larger.The model, called VibeThinker-3B, scored 94.3 on AIME 2026 — the American Invitational Mathematics Examination, one of the most demanding standardized math competitions in the world. That figure places it alongside DeepSeek V3.2, a model with 671 billion parameters, and ahead of Gemini 3 Pro, Google's high-performance flagship reasoning system, which scored 91.7. With a test-time scaling technique the team calls Claim-Level Reliability Assessment, the score climbs to 97.1, edging past virtually every system in the public record.Within hours of publication, the paper had drawn 62 upvotes on Hugging Face's daily papers feed, the model repository had accumulated 130 likes, and the GitHub repository had reached 685 stars. But the reaction on social media was not uniformly celebratory. It was, in many cases, deeply skeptical."WHAT THE HELL is happening in AI?" wrote the user @orcus108 on X, in a post that accumulated over 161,000 views. "A 3B parameter model just put up coding benchmark scores in the same league as Claude Opus 4.5… I genuinely don't know if this is a breakthrough or if the benchmarks are broken."That tension — between genuine scientific advancement and the growing suspicion that AI benchmarks have become gameable to the point of meaninglessness — sits at the heart of the VibeThinker-3B story. And the answer matters enormously, not just for academic bragging rights, but for the multibillion-dollar question of whether the AI industry's relentless

Article preview — originally published by VentureBeat AI. Full story at the source.
Read full story on VentureBeat AI → More top stories
Aggregated and edited by the Scoop newsroom. We surface news from VentureBeat AI alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop