- Context cache properties: Describes the key properties of a context cache object, such as its name, model, and expiration time.
- Context cache use restrictions: Explains the features that you cannot specify in a request when you use a context cache.
- Use a context cache sample: Provides code samples that demonstrate how to use a context cache with the REST API and Python SDK.
You can use REST APIs or the Python SDK to reference content stored in a context cache in a generative AI application. Before you can use a context cache, you must first create the context cache.
Context cache properties
The context cache object that you use in your code includes the following properties:
name
: The full resource name of the context cache. You must use this name when you reference the cache in a request. The name is returned in the response when you create the context cache.- Format:
projects/PROJECT_NUMBER/locations/LOCATION/cachedContents/CACHE_ID
Example request body:
"cached_content": "projects/123456789012/locations/us-central1/123456789012345678"
- Format:
model
: The resource name of the model that was used to create the cache.- Format:
projects/PROJECT_NUMBER/locations/LOCATION/publishers/PUBLISHER_NAME/models/MODEL_ID
- Format:
createTime
: ATimestamp
that specifies when the context cache was created.updateTime
: ATimestamp
that specifies the most recent update time of the context cache. Before a cache is updated, itscreateTime
andupdateTime
are the same.expireTime
: ATimestamp
that specifies when the context cache expires. The default expiration time is 60 minutes aftercreateTime
. You can update the cache with a new expiration time. After a cache expires, it is marked for deletion and cannot be used or updated. To use an expired cache, you must recreate it.
Context cache use restrictions
When you create a context cache, you can specify the following features. You should not specify these features again in subsequent requests that use the cache:
GenerativeModel.system_instructions
: Specifies instructions for the model to use before it receives instructions from a user. For more information, see System instructions.GenerativeModel.tool_config
: Specifies tools for the Gemini model to use, such as a tool for the function calling feature. For more information, see thetool_config
reference.GenerativeModel.tools
: Specifies functions to create a function calling application. For more information, see Function calling.
Use a context cache sample
The following code samples demonstrate how to use a context cache in a request.
Python
Install
pip install --upgrade google-genai
To learn more, see the SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values # with appropriate values for your project. export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT export GOOGLE_CLOUD_LOCATION=us-central1 export GOOGLE_GENAI_USE_VERTEXAI=True
Go
Learn how to install or update the Go.
To learn more, see the SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values # with appropriate values for your project. export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT export GOOGLE_CLOUD_LOCATION=us-central1 export GOOGLE_GENAI_USE_VERTEXAI=True
REST
You can use REST to use a context cache with a prompt by using the Vertex AI API to send a POST request to the publisher model endpoint.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region where the request to create the context cache was processed.
- MIME_TYPE: The text prompt to submit to the model.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-2.0-flash-001:generateContent
Request JSON body:
{ "cachedContent": "projects/PROJECT_NUMBER/locations/LOCATION/cachedContents/CACHE_ID", "contents": [ {"role":"user","parts":[{"text":"PROMPT_TEXT"}]} ], "generationConfig": { "maxOutputTokens": 8192, "temperature": 1, "topP": 0.95, }, "safetySettings": [ { "category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_MEDIUM_AND_ABOVE" }, { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" }, { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" }, { "category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" } ], }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
, and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-2.0-flash-001:generateContent"
PowerShell
Save the request body in a file named request.json
, and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-2.0-flash-001:generateContent" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Example curl command
LOCATION="us-central1" MODEL_ID="gemini-2.0-flash-001" PROJECT_ID="test-project" curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent" -d \ '{ "cachedContent": "projects/${PROJECT_NUMBER}/locations/${LOCATION}/cachedContents/${CACHE_ID}", "contents": [ {"role":"user","parts":[{"text":"What are the benefits of exercise?"}]} ], "generationConfig": { "maxOutputTokens": 8192, "temperature": 1, "topP": 0.95, }, "safetySettings": [ { "category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_MEDIUM_AND_ABOVE" }, { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" }, { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" }, { "category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" } ], }'
- Learn how to update the expiration time of a context cache.
- Learn how to create a new context cache.
- Learn how to get information about all context caches associated with a Google Cloud project.
- Learn how to delete a context cache.