Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.

Use a context cache

This guide shows you how to use a context cache in a generative AI application and covers the following topics:

Context cache properties: Describes the key properties of a context cache object, such as its name, model, and expiration time.
Context cache use restrictions: Explains the features that you cannot specify in a request when you use a context cache.
Use a context cache sample: Provides code samples that demonstrate how to use a context cache with the REST API and Python SDK.

You can use REST APIs or the Python SDK to reference content stored in a context cache in a generative AI application. Before you can use a context cache, you must first create the context cache.

Context cache properties

The context cache object that you use in your code includes the following properties:

name: The full resource name of the context cache. You must use this name when you reference the cache in a request. The name is returned in the response when you create the context cache.
- Format: projects/PROJECT_NUMBER/locations/LOCATION/cachedContents/CACHE_ID
- Example request body:
```
"cached_content": "projects/123456789012/locations/us-central1/123456789012345678" 
```
model: The resource name of the model that was used to create the cache.
- Format: projects/PROJECT_NUMBER/locations/LOCATION/publishers/PUBLISHER_NAME/models/MODEL_ID
createTime: A Timestamp that specifies when the context cache was created.
updateTime: A Timestamp that specifies the most recent update time of the context cache. Before a cache is updated, its createTime and updateTime are the same.
expireTime: A Timestamp that specifies when the context cache expires. The default expiration time is 60 minutes after createTime. You can update the cache with a new expiration time. After a cache expires, it is marked for deletion and cannot be used or updated. To use an expired cache, you must recreate it.

Context cache use restrictions

When you create a context cache, you can specify the following features. You should not specify these features again in subsequent requests that use the cache:

GenerativeModel.system_instructions: Specifies instructions for the model to use before it receives instructions from a user. For more information, see System instructions.
GenerativeModel.tool_config: Specifies tools for the Gemini model to use, such as a tool for the function calling feature. For more information, see the tool_config reference.
GenerativeModel.tools: Specifies functions to create a function calling application. For more information, see Function calling.

Use a context cache sample

The following code samples demonstrate how to use a context cache in a request.

Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values # with appropriate values for your project. export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT export GOOGLE_CLOUD_LOCATION=us-central1 export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai from google.genai.types import GenerateContentConfig, HttpOptions  client = genai.Client(http_options=HttpOptions(api_version="v1")) # Use content cache to generate text response # E.g cache_name = 'projects/111111111111/locations/us-central1/cachedContents/1111111111111111111' response = client.models.generate_content(     model="gemini-2.5-flash",     contents="Summarize the pdfs",     config=GenerateContentConfig(         cached_content=cache_name,     ), ) print(response.text) # Example response #   The Gemini family of multimodal models from Google DeepMind demonstrates remarkable capabilities across various #   modalities, including image, audio, video, and text....

Go

Learn how to install or update the Go.

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values # with appropriate values for your project. export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT export GOOGLE_CLOUD_LOCATION=us-central1 export GOOGLE_GENAI_USE_VERTEXAI=True

import ( 	"context" 	"fmt" 	"io"  	genai "google.golang.org/genai" )  // useContentCacheWithTxt shows how to use content cache to generate text content. func useContentCacheWithTxt(w io.Writer, cacheName string) error { 	ctx := context.Background()  	client, err := genai.NewClient(ctx, &genai.ClientConfig{ 		HTTPOptions: genai.HTTPOptions{APIVersion: "v1"}, 	}) 	if err != nil { 		return fmt.Errorf("failed to create genai client: %w", err) 	}  	resp, err := client.Models.GenerateContent(ctx, 		"gemini-2.5-flash", 		genai.Text("Summarize the pdfs"), 		&genai.GenerateContentConfig{ 			CachedContent: cacheName, 		}, 	) 	if err != nil { 		return fmt.Errorf("failed to use content cache to generate content: %w", err) 	}  	respText := resp.Text()  	fmt.Fprintln(w, respText)  	// Example response: 	// The provided research paper introduces Gemini 1.5 Pro, a multimodal model capable of recalling 	// and reasoning over information from very long contexts (up to 10 million tokens).  Key findings include: 	// 	// * **Long Context Performance:** 	// ...  	return nil }

REST

You can use REST to use a context cache with a prompt by using the Vertex AI API to send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

PROJECT_ID: Your project ID.
LOCATION: The region where the request to create the context cache was processed.
MIME_TYPE: The text prompt to submit to the model.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-2.0-flash-001:generateContent

Request JSON body:

 {   "cachedContent": "projects/PROJECT_NUMBER/locations/LOCATION/cachedContents/CACHE_ID",   "contents": [       {"role":"user","parts":[{"text":"PROMPT_TEXT"}]}   ],   "generationConfig": {       "maxOutputTokens": 8192,       "temperature": 1,       "topP": 0.95,   },   "safetySettings": [       {           "category": "HARM_CATEGORY_HATE_SPEECH",           "threshold": "BLOCK_MEDIUM_AND_ABOVE"       },       {           "category": "HARM_CATEGORY_DANGEROUS_CONTENT",           "threshold": "BLOCK_MEDIUM_AND_ABOVE"       },       {           "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",           "threshold": "BLOCK_MEDIUM_AND_ABOVE"       },       {           "category": "HARM_CATEGORY_HARASSMENT",           "threshold": "BLOCK_MEDIUM_AND_ABOVE"       }   ], }

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell, which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-2.0-flash-001:generateContent"

PowerShell

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-2.0-flash-001:generateContent" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Response

 {   "candidates": [     {       "content": {         "role": "model",         "parts": [           {             "text": "MODEL_RESPONSE"           }         ]       },       "finishReason": "STOP",       "safetyRatings": [         {           "category": "HARM_CATEGORY_HATE_SPEECH",           "probability": "NEGLIGIBLE",           "probabilityScore": 0.21866937,           "severity": "HARM_SEVERITY_NEGLIGIBLE",           "severityScore": 0.19946389         },         {           "category": "HARM_CATEGORY_DANGEROUS_CONTENT",           "probability": "MEDIUM",           "probabilityScore": 0.6880493,           "severity": "HARM_SEVERITY_MEDIUM",           "severityScore": 0.43374163         },         {           "category": "HARM_CATEGORY_HARASSMENT",           "probability": "NEGLIGIBLE",           "probabilityScore": 0.4442634,           "severity": "HARM_SEVERITY_LOW",           "severityScore": 0.37903354         },         {           "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",           "probability": "NEGLIGIBLE",           "probabilityScore": 0.10502681,           "severity": "HARM_SEVERITY_LOW",           "severityScore": 0.28170192         }       ]     }   ],   "usageMetadata": {     "promptTokenCount": 55927,     "candidatesTokenCount": 105,     "totalTokenCount": 56032   } }

Example curl command

LOCATION="us-central1" MODEL_ID="gemini-2.0-flash-001" PROJECT_ID="test-project"  curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent" -d \ '{   "cachedContent": "projects/${PROJECT_NUMBER}/locations/${LOCATION}/cachedContents/${CACHE_ID}",   "contents": [       {"role":"user","parts":[{"text":"What are the benefits of exercise?"}]}   ],   "generationConfig": {       "maxOutputTokens": 8192,       "temperature": 1,       "topP": 0.95,   },   "safetySettings": [     {       "category": "HARM_CATEGORY_HATE_SPEECH",       "threshold": "BLOCK_MEDIUM_AND_ABOVE"     },     {       "category": "HARM_CATEGORY_DANGEROUS_CONTENT",       "threshold": "BLOCK_MEDIUM_AND_ABOVE"     },     {       "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",       "threshold": "BLOCK_MEDIUM_AND_ABOVE"     },     {       "category": "HARM_CATEGORY_HARASSMENT",       "threshold": "BLOCK_MEDIUM_AND_ABOVE"     }   ], }'

Learn how to update the expiration time of a context cache.
Learn how to create a new context cache.
Learn how to get information about all context caches associated with a Google Cloud project.
Learn how to delete a context cache.

Use a context cache Stay organized with collections Save and categorize content based on your preferences.

Context cache properties

Context cache use restrictions

Use a context cache sample

Python

Install

Go

REST

curl

PowerShell

Response

Example curl command

Use a context cache