Gemma 4, Optimized
for Real Hardware

Run powerful AI on your CPU, mobile device, or edge hardware. 51% faster inference, 50%+ smaller models, fully local and private.

51% Faster Inference

57% Size Reduction

100% Local & Private

Get Started → View on GitHub

🦙 Ollama Hub 🤗 Hugging Face 📊 Kaggle Submission 🔬 Research Article 👤 Developer Blog

Choose Your Model Family

From multimodal powerhouses to ultra-lightweight mobile models, optimized for every use case.

MULTIMODAL

gemma4-turbo

Full-featured AI with vision and audio support

+51%

Faster

IQ4_XS

Quantization

4.3-18 GB

Model Sizes

17K+

Downloads

Vision and audio capabilities
51% faster than stock Gemma 4
4 model sizes (e2b, e4b, 26b, 31b)
Tool calling and function support
Windows-optimized for CPU inference

Ollama Hub Hugging Face

ULTRA-LIGHTWEIGHT

gemma4-nano

Mobile-first AI that actually works on real devices

57%

Smaller

Q3_K_S

Quantization

3.1-14 GB

Model Sizes

<1 GB

RAM Usage

Text-only, optimized for mobile/edge
Sub-1GB RAM usage (891.7 MB total)
Stays cool on 8GB RAM phones
Full 4.5B params, 128K context
13% faster than turbo on CPU

Ollama Hub Hugging Face

COMING SOON

More Variants

Pushing the boundaries of AI optimization

🚀

In Progress

∞

Possibilities

Experimental quantization methods
Specialized task-optimized variants
Even smaller mobile models
Performance tuning research
Community-driven development

Follow on GitHub

Get Started in Seconds

One command to run powerful local AI on your hardware

Option 1: Ollama (Recommended)

# Install Ollama from https://ollama.com

# For multimodal AI with vision:
ollama run ssfdre38/gemma4-turbo:e4b

# For ultra-lightweight mobile/edge:
ollama run ssfdre38/gemma4-nano:e2b

# That's it! Start chatting with local AI 🚀

Option 2: Direct GGUF Download

# Download from Hugging Face:
wget https://huggingface.co/ssfdre38/gemma4-turbo-gguf/resolve/main/gemma4-e4b-iq4xs-turbo.gguf

# Use with llama.cpp, Ollama, or any GGUF-compatible tool

Model Size Guide

# Turbo (multimodal):
e2b   → 4.3 GB   # Smallest with vision
e4b   → 6.1 GB   # Recommended default
26b   → 15 GB    # High capability
31b   → 18 GB    # Maximum performance

# Nano (text-only):
e2b   → 3.1 GB   # Mobile-ready
e4b   → 4.7 GB   # Balanced
26b   → 12 GB    # High quality
31b   → 14 GB    # Best performance

Gemma 4, Optimized for Real Hardware