Gemma 4, Optimized
for Real Hardware

Run powerful AI on your CPU, mobile device, or edge hardware. 51% faster inference, 50%+ smaller models, fully local and private.

51% Faster Inference
57% Size Reduction
100% Local & Private

Choose Your Model Family

From multimodal powerhouses to ultra-lightweight mobile models, optimized for every use case.

MULTIMODAL

gemma4-turbo

Full-featured AI with vision and audio support

+51%
Faster
IQ4_XS
Quantization
4.3-18 GB
Model Sizes
20.5K+
Downloads
  • Vision and audio capabilities
  • 51% faster than stock Gemma 4
  • 5 model sizes (e2b, e4b, 12b, 26b, 31b)
  • Tool calling and function support
  • Windows-optimized for CPU inference
ULTRA-LIGHTWEIGHT

gemma4-nano

Mobile-first AI that actually works on real devices

57%
Smaller
Q3_K_S
Quantization
3.1-14 GB
Model Sizes
2.7K+
Downloads
  • Text-only, optimized for mobile/edge
  • Sub-1GB RAM usage (891.7 MB total)
  • Stays cool on 8GB RAM phones
  • Full 4.5B params, 128K context
  • 13% faster than turbo on CPU
COMING SOON

More Variants

Pushing the boundaries of AI optimization

🚀
In Progress
Possibilities
  • Experimental quantization methods
  • Specialized task-optimized variants
  • Even smaller mobile models
  • Performance tuning research
  • Community-driven development

*Note: Hugging Face download counts displayed on their model cards reflect a rolling 30-day window rather than permanent cumulative downloads. Our stats counters automatically baseline and sum permanent, absolute community downloads.

Get Started in Seconds

One command to run powerful local AI on your hardware

Option 1: Ollama (Recommended)

# Install Ollama from https://ollama.com

# For multimodal AI with vision:
ollama run ssfdre38/gemma4-turbo:e4b

# For ultra-lightweight mobile/edge:
ollama run ssfdre38/gemma4-nano:e2b

# That's it! Start chatting with local AI 🚀

Option 2: Direct GGUF Download

# Download from Hugging Face:
wget https://huggingface.co/ssfdre38/gemma4-turbo-gguf/resolve/main/gemma4-e4b-iq4xs-turbo.gguf

# Use with llama.cpp, Ollama, or any GGUF-compatible tool

Model Size Guide

# Turbo (multimodal):
e2b   → 4.3 GB   # Smallest with vision
e4b   → 6.1 GB   # Recommended default
12b   → 6.9 GB   # Balanced multimodal
26b   → 15 GB    # High capability
31b   → 18 GB    # Maximum performance

# Nano (text-only):
e2b   → 3.1 GB   # Mobile-ready
e4b   → 4.7 GB   # Balanced
12b   → 5.7 GB   # Balanced reasoning
26b   → 12 GB    # High quality
31b   → 14 GB    # Best performance