computer-science

Google's latest DiffusionGemma open AI model comes with a 4x speed boost

Ars Technica · Jun 10, 2026, 7:29 PM · Also reported by 2 other sources

Another day, another AI model from Google. This time, Google Deep Mind has released a new member of the Gemma 4 open model family, but it's fundamentally different from the rest of the lineup. Diffusion Gemma doesn't generate outputs linearly like most AI models. Instead, it can produce an entire block of text in parallel. Google says this makes it faster and more efficient when running on local hardware like an Nvidia DGX or a humble gaming GPU. Most AI models are designed to be autoregressive—they generate text left to right one token at a time. DiffusionGemma has more in common with image generation models, which start with static and then denoise it to create the desired content. This model takes a field of placeholder tokens running over the canvas multiple times to generate likely tokens and using those to improve estimation of others. At the end of the process, the model finalizes its token outputs in one large block—the "denoised" text canvas. DiffusionGemma is fairly large in the realm of Google's open models. It's a Mixture of Experts (MoE) model with a total of 26 billion parameters, but only 3.8 billion are activated during inference. That means it should fit in the 18GB ram allotment of a high-end GPU. In testing with an RTX 5090, DiffusionGemma spits out around 700 tokens per second. With a single Nvidia H100 AI accelerator, DiffusionGemma can produce 1,000+ tokens per second. That's about four times the output of the similarly sized autoregressive Gemma models.Read full article Comments

Article preview — originally published by Ars Technica. Full story at the source.

Read full story on Ars Technica → More top stories

Also covered by

Decrypt Google's DiffusionGemma AI Hits 1,000 Tokens Per Second—And It's Free Investing.com Google launches DiffusionGemma with 4x faster text generation

Aggregated and edited by the Scoop newsroom. We surface news from Ars Technica alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Google's latest DiffusionGemma open AI model comes with a 4x speed boost

Also covered by

More in computer-science