Use Gemma open models

This guide shows you how to use Gemma open models and covers the following topics:

Gemma is a family of lightweight, generative AI open models based on Gemini. You can run Gemma models in your applications, on your hardware, or on hosted services.

Because Gemma models have open weights, you can customize them with fine-tuning to improve their performance on specific tasks. You can tune any Gemma model with the AI framework of your choice and the Vertex AI SDK. To get started, open a fine-tuning notebook example from the Gemma model card in Model Garden.

Available Gemma models

The following Gemma models are available to use with Vertex AI. To learn more about and test the Gemma models, see their Model Garden model cards.

Model name Use cases Model Garden model card
Gemma 3n Capable of multimodal input, handling text, image, video, and audio input, and generating text outputs. Go to the Gemma 3n model card
Gemma 3 Best for text generation and image understanding tasks, including question answering, summarization, and reasoning. Go to the Gemma 3 model card
Gemma 2 Best for text generation, summarization, and extraction. Go to the Gemma 2 model card
Gemma Best for text generation, summarization, and extraction. Go to the Gemma model card
CodeGemma Best for code generation and completion. Go to the CodeGemma model card
PaliGemma 2 Best for image captioning tasks and visual question and answering tasks. Go to the PaliGemma 2 model card
PaliGemma Best for image captioning tasks and visual question and answering tasks. Go to the PaliGemma model card
ShieldGemma 2 Checks the safety of synthetic and natural images to help you build robust datasets and models. Go to the ShieldGemma 2 model card
TxGemma Best for therapeutic prediction tasks, including classification, regression, or generation, and reasoning tasks. Go to the TxGemma model card
MedGemma Gemma 3 variants that are trained for performance on medical text and image comprehension. Go to the MedGemma model card
MedSigLIP SigLIP variant that is trained to encode medical images and text into a common embedding space. Go to the MedSigLIP model card
T5Gemma Well-suited for a variety of generative tasks, including question answering, summarization, and reasoning. Go to the T5Gemma model card

Options for using Gemma models

You can use Gemma models in various environments, including Vertex AI, Google Kubernetes Engine, Dataflow, and Colaboratory.

Use Gemma with Vertex AI

Vertex AI offers a managed platform to build and scale machine learning projects without requiring in-house MLOps expertise. We recommend this option if you want to access end-to-end MLOps capabilities, value-added ML features, and a serverless experience for streamlined development.

You can use Vertex AI as the downstream application that serves the Gemma models. For example, you can port weights from a Keras implementation of Gemma and then use Vertex AI to serve that version to get predictions.

To get started, see the following notebook examples:

Serve models

Fine-tune models

Run local inference

Use Gemma with GKE

Google Kubernetes Engine (GKE) is the Google Cloud solution for managed Kubernetes that provides scalability, security, resilience, and cost-effectiveness. We recommend this option if you have existing Kubernetes investments, have in-house MLOps expertise, or need granular control over complex AI/ML workloads with unique security, data pipeline, and resource management requirements. To learn more, see the following tutorials in the GKE documentation:

Use Gemma with Dataflow

You can use Gemma models with Dataflow to run inference pipelines for tasks like sentiment analysis. To learn more, see Run inference pipelines with Gemma open models.

Use Gemma with Colab

You can use Gemma with Colaboratory and framework options such as PyTorch and JAX. To learn more, see:

Gemma model sizes and capabilities

Gemma models are available in several sizes so you can build generative AI solutions based on your available computing resources, the capabilities you need, and where you want to run them. Each model is available in several versions:

  • Pretrained: This version of the model has not been trained on any specific tasks or instructions beyond the Gemma core data training set. We recommend that you fine-tune this model before you use it.
  • Instruction-tuned: This version of the model was trained with human language interactions to participate in a conversation, similar to a basic chatbot.
  • Mix fine-tuned: This version of the model is fine-tuned on a mixture of academic datasets and accepts natural language prompts.

Lower parameter sizes mean lower resource requirements and more deployment flexibility.

Model name Parameters size Input Output Tuned versions Intended platforms
Gemma 3n
Gemma 3n E4B 4 billion effective parameters Text, image and audio Text
  • Pretrained
  • Instruction-tuned
Mobile devices and laptops
Gemma 3n E2B 2 billion effective parameters Text, image and audio Text
  • Pretrained
  • Instruction-tuned
Mobile devices and laptops
Gemma 3
Gemma 27B 27 billion Text and image Text
  • Pretrained
  • Instruction-tuned
Large servers or server clusters
Gemma 12B 12 billion Text and image Text
  • Pretrained
  • Instruction-tuned
Higher-end desktop computers and servers
Gemma 4B 4 billion Text and image Text
  • Pretrained
  • Instruction-tuned
Desktop computers and small servers
Gemma 1B 1 billion Text Text
  • Pretrained
  • Instruction-tuned
Mobile devices and laptops
Gemma 2
Gemma 27B 27 billion Text Text
  • Pretrained
  • Instruction-tuned
Large servers or server clusters
Gemma 9B 9 billion Text Text
  • Pretrained
  • Instruction-tuned
Higher-end desktop computers and servers
Gemma 2B 2 billion Text Text
  • Pretrained
  • Instruction-tuned
Mobile devices and laptops
Gemma
Gemma 7B 7 billion Text Text
  • Pretrained
  • Instruction-tuned
Desktop computers and small servers
Gemma 2B 2.2 billion Text Text
  • Pretrained
  • Instruction-tuned
Mobile devices and laptops
CodeGemma
CodeGemma 7B 7 billion Text Text
  • Pretrained
  • Instruction-tuned
Desktop computers and small servers
CodeGemma 2B 2 billion Text Text
  • Pretrained
Desktop computers and small servers
PaliGemma 2
PaliGemma 28B 28 billion Text and image Text
  • Pretrained
  • Mix fine-tuned
Large servers or server clusters
PaliGemma 10B 10 billion Text and image Text
  • Pretrained
  • Mix fine-tuned
Higher-end desktop computers and servers
PaliGemma 3B 3 billion Text and image Text
  • Pretrained
  • Mix fine-tuned
Desktop computers and small servers
PaliGemma
PaliGemma 3B 3 billion Text and image Text
  • Pretrained
  • Mix fine-tuned
Desktop computers and small servers
ShieldGemma 2
ShieldGemma 2 4 billion Text and image Text
  • Fine-tuned
Desktop computers and small servers
TxGemma
TxGemma 27B 27 billion Text Text
  • Pretrained
  • Instruction-tuned
Large servers or server clusters
TxGemma 9B 9 billion Text Text
  • Pretrained
  • Instruction-tuned
Higher-end desktop computers and servers
TxGemma 2B 2 billion Text Text
  • Pretrained
Mobile devices and laptops
MedGemma
MedGemma 27B 27 billion Text and image Text
  • Text-only instruction-tuned
  • Instruction-tuned
Large servers or server clusters
MedGemma 4B 4 billion Text and image Text
  • Pretrained
  • Instruction-tuned
Desktop computers and small servers
MedSigLIP
MedSigLIP 800 million Text and image Embedding
  • Fine-tuned
Mobile devices and laptops
T5Gemma
T5Gemma 9B-9B 18 billion Text Text
  • PrefixLM, pretrained
  • PrefixLM, instruction-tuned
  • UL2, pretrained
  • UL2, instruction-tuned
Mobile devices and laptops
T5Gemma 9B-2B 11 billion Text Text
  • PrefixLM, pretrained
  • PrefixLM, instruction-tuned
  • UL2, pretrained
  • UL2, instruction-tuned
Mobile devices and laptops
T5Gemma 2B-2B 4 billion Text Text
  • PrefixLM, pretrained
  • PrefixLM, instruction-tuned
  • UL2, pretrained
  • UL2, instruction-tuned
Mobile devices and laptops
T5Gemma XL-XL 4 billion Text Text
  • PrefixLM, pretrained
  • PrefixLM, instruction-tuned
  • UL2, pretrained
  • UL2, instruction-tuned
Mobile devices and laptops
T5Gemma M-L 2 billion Text Text
  • PrefixLM, pretrained
  • PrefixLM, instruction-tuned
  • UL2, pretrained
  • UL2, instruction-tuned
Mobile devices and laptops
T5Gemma L-L 1 billion Text Text
  • PrefixLM, pretrained
  • PrefixLM, instruction-tuned
  • UL2, pretrained
  • UL2, instruction-tuned
Mobile devices and laptops
T5Gemma B-B 0.6 billion Text Text
  • PrefixLM, pretrained
  • PrefixLM, instruction-tuned
  • UL2, pretrained
  • UL2, instruction-tuned
Mobile devices and laptops
T5Gemma S-S 0.3 billion Text Text
  • PrefixLM, pretrained
  • PrefixLM, instruction-tuned
  • UL2, pretrained
  • UL2, instruction-tuned
Mobile devices and laptops

Gemma is tested on Google's purpose-built v5e TPU hardware and NVIDIA's L4 (G2 Standard), A100 (A2 Standard), and H100 (A3 High) GPU hardware.

What's next