This guide shows you how to ground Gemini models on your data in Elasticsearch. This guide covers the following topics: The following diagram summarizes the overall workflow: Grounding uses public and private datasets to provide context and facts to ground Large Language Model (LLM) responses. By grounding with Elasticsearch, you can use your existing Elasticsearch indexes to enhance the quality and reliability of Gemini's output, reduce hallucinations, and help ensure responses are relevant to your data. This feature lets you build powerful RAG applications such as the following: You can ground an answer on up to 10 data sources at one time. You can combine grounding with Elasticsearch with Grounding with Google Search to connect the model with world knowledge, a wide range of topics, or up-to-date information on the internet. The following models support grounding with Elasticsearch with text input only: For the best grounding responses, follow these principles when you create a search template: You can use your own search templates. The following generic kNN search template is recommended for Elasticsearch grounding. For additional search templates, see the GitHub repository. The following example shows a generic kNN search template for semantic search with Vertex AI. You can generate grounded responses using the Google Cloud console for quick tests or the REST API for integration into your applications. To ground with Elasticsearch in the Google Cloud console, follow these steps: Go to the Create prompt page in Vertex AI Studio. In the Settings panel, to ground your data, click the Grounding: Your data toggle. In the Customize Grounding pane, select Elasticsearch. In the Elasticsearch endpoint field, enter the endpoint. In the Elasticsearch API Key field, enter the API Key. In the Elasticsearch index field, enter the index. In the Elasticsearch search template field, enter the search template. To adjust the number of hits, use the Number of hits slider. Click Save. Enter your prompt. Click Submit. If your model prompt successfully grounds to Elasticsearch data stores using the Vertex AI Studio or the API, then the model's responses include metadata with citations and source content. If low-source relevance or incomplete information occurs within the model's response, then metadata might not be provided, and the prompt response won't be grounded. This section explains how you use the Vertex AI API to ground your LLM responses. Before you can ground LLM responses with Elasticsearch, you must complete the following: Activate the Vertex AI API: Ensure that both the Vertex AI API is enabled for your Google Cloud project. Install and sign in to the Google Cloud CLI CLI: Install and initialize the gcloud CLI command-line tool. Elasticsearch setup: Use an existing Elasticsearch cluster and index that you want to use for grounding. Obtain the following information from your Elasticsearch setup: Create an Elasticsearch search template: Use an Elasticsearch data source that uses a reference template that returns result data for grounding. Use the following instructions to ground Gemini with your Elasticsearch data source using the Vertex AI API. To send a text prompt and ground it with Elasticsearch, send a POST request to the Vertex AI API. At a minimum, you must provide the request body. Make sure to do the following replacements: For more information on other API fields such as system instructions and multi-turn chats, see Generative AI beginner's guide. You can save the request body in a file named You should receive a JSON response similar to the following: The response from both APIs include the LLM-generated text, which is called a candidate. If your model prompt successfully grounds to your Elasticsearch data source, then the responses include grounding metadata, which identifies the parts of the response that were derived from your Elasticsearch data. However, there are several reasons this metadata might not be provided, and the prompt response won't be grounded. These reasons include low-source relevance or incomplete information within the model's response. The following is a breakdown of the output data:
Overview of grounding with Elasticsearch
Supported models
Set up a search template in Elasticsearch
Best practices
Sample templates
PUT _scripts/google-template-knn-multioutput { "script": { "lang": "mustache", "source": { "_source": { "excludes": [ "title_embedding", "description_embedding", "images"] }, "size": "num_hits", "knn" : [ { "field": "description_embedding", "k": 5, "num_candidates": 10, "query_vector_builder": { "text_embedding": { "model_id": "googlevertexai_embeddings_004", "model_text": "query" } }, "boost": 0.4 }, { "field": "title_embedding", "k": 5, "num_candidates": 10, "query_vector_builder": { "text_embedding": { "model_id": "googlevertexai_embeddings_004", "model_text": "query" } }, "boost": 0.6 } ] } } }
Generate grounded responses with Elasticsearch
Method Description Use Case Google Cloud console A graphical user interface within the Google Cloud console for interactively building and testing prompts. Best for quick experiments, prototyping, and users who prefer a visual interface without writing code. REST API A programmatic interface to integrate grounding capabilities directly into your applications. Ideal for automating workflows, building production applications, and integrating with existing systems. Console
Understand your response
REST
Prerequisites
API access
Prepare a grounded generation request
Request JSON body:
{ "contents": [ { "role": "user", "parts": [ { "text": "QUERY" } ] } ], "tools": [{ "retrieval": { "externalApi": { "api_spec": "ELASTIC_SEARCH", "endpoint": "ELASTIC_SEARCH_ENDPOINT", "apiAuth": { "apiKeyConfig": { "apiKeyString": "ApiKey ELASTIC_SEARCH_API_KEY" } }, "elasticSearchParams": { "index": "INDEX_NAME", "searchTemplate": "SEARCH_TEMPLATE_NAME", "numHits": "NUM_HITS", } } } }] }
Send the API request
request.json
. Then execute the POST API request, and do the following replacements:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" \ -d @request.json \ "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:generateContent"
{ "candidates": [ { "content": { "role": "model", "parts": [ { "text": "Based on the information ..." } ] }, "finishReason": "STOP", "safetyRatings": [ "..." ], "groundingMetadata": { "groundingChunks": [ { "retrievedContext": { "text": "ipsum lorem ..." } }, {...}, {...}, ], "groundingSupports": [ { "segment": { "startIndex": 25, "endIndex": 147, "text": "ipsum lorem ..." }, "groundingChunkIndices": [1,2], "confidenceScores": [0.6626542, 0.82018316], }, ], }, } ], }
Understand your response
model
.
Understand the response
The response from the Vertex AI Studio or the API includes the LLM-generated text, which is called a candidate. If your model prompt successfully grounds to your Elasticsearch data source, the response includes grounding metadata that identifies the parts of the response derived from your data.
However, if there is low source relevance or incomplete information, metadata might not be provided, and the response won't be grounded.
The API response contains the following details:
- Role: Indicates the sender of the answer. For a model-generated response, the role is
model
. - Text: The answer generated by the LLM.
- Grounding metadata: Information about the grounding source, which contains the following elements:
- Grounding chunks: A list of results from your Elasticsearch index that support the answer.
- Grounding supports: Information about a specific claim within the answer that can be used to show citations:
- Segment: The part of the model's answer that is substantiated by a grounding chunk.
- Grounding chunk index: The index of the grounding chunks in the grounding chunks list that corresponds to this claim.
- Confidence scores: A number from 0 to 1 that indicates how grounded the claim is in the provided set of grounding chunks. Not available for Gemini 2.5 Pro and Gemini 2.5 Flash and later.
What's next
- To learn how to send chat prompt requests, see Multiturn chat.
- To learn about responsible AI best practices and Vertex AI's safety filters, see Safety best practices.