Owllo Blog

Hey, Team Owllo here.

On April 3rd, Google released Gemma 4, a new open-weight AI model built on the same research behind Gemini 3. It’s free to use commercially, modify, and redistribute. For anyone running local AI, this is worth paying attention to. Here’s a breakdown of each model and how to figure out which one fits your machine.

Four models, four different use cases

Gemma 4 comes in four sizes: E2B, E4B, 26B MoE, and 31B Dense. All of them handle text and image input, and the smaller models also support audio.

The E2B is the lightest of the bunch. It’ll run on an 8GB RAM laptop, even a Raspberry Pi. Don’t expect deep reasoning, but for simple Q&A and lightweight tasks, it holds up surprisingly well. Our team was genuinely impressed for its size, and Korean language performance was better than expected.

The E4B is where things get interesting. It runs on just 6GB of VRAM while reportedly outperforming Gemma 3 27B on benchmarks. Text, images, and audio input are all supported, and the context window is solid. For most users, this is the sweet spot. We’re planning to offer it as a default model during the beta period alongside our own.

From here, we’re getting into hardware that most everyday computers won’t handle comfortably.

The 26B MoE is probably the most technically fascinating model in this release. It has 26 billion parameters, but only 3 billion are active at any given time during inference. That architecture makes it much faster and leaner than its size suggests. Officially it’s rated for around 18GB of memory at 4-bit quantization, but in our testing, you really want at least 24GB to get it running properly.

The 31B Dense is currently ranked third among all open-weight models. Quality-wise it’s the top of this lineup, but it needs around 20GB even at 4-bit, and memory usage climbs steeply with longer context. Realistically, you’re looking at an RTX 4090 or an Apple Silicon Mac with 32GB or more. Even then, in our experience, you’ll want to be on the higher end of that range.

What to run based on your setup

Under 8GB RAM: Start with E2B, or the 4-bit version of E4B.
16 to 20GB: E4B at 8-bit is comfortable, and 26B MoE at 4-bit is worth a try.
24GB GPU or more: 31B Dense is on the table.

Apple Silicon Mac users have a natural advantage here. Because the CPU and GPU share the same memory pool, you get more usable headroom than a Windows machine with the same spec on paper. Both E2B and the 4-bit E4B run fine on an 8GB MacBook Air.

CPU-only is technically possible, but we wouldn’t recommend it for daily use. Text generation drops to roughly 2 to 3 characters per second, and your machine runs hot the whole time. Fine for a one-time test, not great for actually getting things done. Picture waiting for each word to trickle out while your laptop turns into a hand warmer.

What this means for Owllo

Gemma 4 is a good sign for the local AI ecosystem. A model like E4B outperforming a previous-generation large model at a fraction of the size shows that the ceiling for on-device AI keeps rising. All Gemma 4 models are available through the Owllo model library, and we’re likely — probably — making one the default during beta. Stay tuned.

Gemma 4 — Here’s What It Means for Owllo Users

Four models, four different use cases

What to run based on your setup

What this means for Owllo