Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.

Grounding with Elasticsearch

This guide shows you how to ground Gemini models on your data in Elasticsearch. This guide covers the following topics:

Overview of grounding with Elasticsearch: Learn about the benefits of grounding with Elasticsearch and the supported models.
Set up a search template in Elasticsearch: Configure a search template in your Elasticsearch instance to prepare your data for grounding.
Generate grounded responses with Elasticsearch: Use the Google Cloud console or the REST API to generate model responses grounded in your Elasticsearch data.

The following diagram summarizes the overall workflow:

Overview of grounding with Elasticsearch

Grounding uses public and private datasets to provide context and facts to ground Large Language Model (LLM) responses. By grounding with Elasticsearch, you can use your existing Elasticsearch indexes to enhance the quality and reliability of Gemini's output, reduce hallucinations, and help ensure responses are relevant to your data.

This feature lets you build powerful RAG applications such as the following:

Generative search summaries
Question-and-answer chatbots with enterprise data
Agents grounded in your data

You can ground an answer on up to 10 data sources at one time. You can combine grounding with Elasticsearch with Grounding with Google Search to connect the model with world knowledge, a wide range of topics, or up-to-date information on the internet.

Supported models

The following models support grounding with Elasticsearch with text input only:

Gemini 2.5 Flash-Lite
Gemini 2.5 Flash with Live API native audio (Preview)
Gemini 2.0 Flash with Live API (Preview)
Gemini 2.5 Pro
Gemini 2.5 Flash
Gemini 2.0 Flash

Set up a search template in Elasticsearch

Best practices

For the best grounding responses, follow these principles when you create a search template:

Include only relevant and useful data. For example, in a product catalog, specifying an image URL might not help the LLM answer prompts about product properties unless the prompt specifically asks for a URL. Similarly, avoid outputting embedding vectors.
Provide a higher number of results. Grounding removes Elasticsearch results with low relevance to your prompts. To capture all relevant context, provide a higher number of Elasticsearch results.
Structure your data effectively. Results data can be in one field or spread across multiple fields.

Sample templates

You can use your own search templates. The following generic kNN search template is recommended for Elasticsearch grounding. For additional search templates, see the GitHub repository.

The following example shows a generic kNN search template for semantic search with Vertex AI.

    PUT _scripts/google-template-knn-multioutput     {       "script": {         "lang": "mustache",         "source": {           "_source": {             "excludes": [ "title_embedding", "description_embedding", "images"]           },             "size": "num_hits",               "knn" : [               {                 "field": "description_embedding",                 "k": 5,                 "num_candidates": 10,                 "query_vector_builder": {                   "text_embedding": {                     "model_id": "googlevertexai_embeddings_004",                     "model_text": "query"                   }                 },                 "boost": 0.4               },               {                 "field": "title_embedding",                 "k": 5,                 "num_candidates": 10,                 "query_vector_builder": {                   "text_embedding": {                     "model_id": "googlevertexai_embeddings_004",                     "model_text": "query"                 }               },               "boost": 0.6               }               ]         }       }     }

Generate grounded responses with Elasticsearch

You can generate grounded responses using the Google Cloud console for quick tests or the REST API for integration into your applications.

Method	Description	Use Case
Google Cloud console	A graphical user interface within the Google Cloud console for interactively building and testing prompts.	Best for quick experiments, prototyping, and users who prefer a visual interface without writing code.
REST API	A programmatic interface to integrate grounding capabilities directly into your applications.	Ideal for automating workflows, building production applications, and integrating with existing systems.

Console

To ground with Elasticsearch in the Google Cloud console, follow these steps:

Go to the Create prompt page in Vertex AI Studio.

Go to Create prompt
In the Settings panel, to ground your data, click the Grounding: Your data toggle.
In the Customize Grounding pane, select Elasticsearch.
In the Elasticsearch endpoint field, enter the endpoint.
In the Elasticsearch API Key field, enter the API Key.
In the Elasticsearch index field, enter the index.
In the Elasticsearch search template field, enter the search template.
To adjust the number of hits, use the Number of hits slider.
Click Save.
Enter your prompt.
Click Submit.

Understand your response

If your model prompt successfully grounds to Elasticsearch data stores using the Vertex AI Studio or the API, then the model's responses include metadata with citations and source content. If low-source relevance or incomplete information occurs within the model's response, then metadata might not be provided, and the prompt response won't be grounded.

REST

This section explains how you use the Vertex AI API to ground your LLM responses.

Prerequisites

Before you can ground LLM responses with Elasticsearch, you must complete the following:

Activate the Vertex AI API: Ensure that both the Vertex AI API is enabled for your Google Cloud project.
Install and sign in to the Google Cloud CLI CLI: Install and initialize the gcloud CLI command-line tool.
Elasticsearch setup: Use an existing Elasticsearch cluster and index that you want to use for grounding. Obtain the following information from your Elasticsearch setup:
- Endpoint: The URL of your Elasticsearch cluster.
- Index Name: The name of the index you want to search such as my-data-index.
- API Key: An API key that allows access to your Elasticsearch cluster. The API key must start with the prefix ApiKey.
Create an Elasticsearch search template: Use an Elasticsearch data source that uses a reference template that returns result data for grounding.

API access

Use the following instructions to ground Gemini with your Elasticsearch data source using the Vertex AI API.

Prepare a grounded generation request

To send a text prompt and ground it with Elasticsearch, send a POST request to the Vertex AI API. At a minimum, you must provide the request body. Make sure to do the following replacements:

QUERY: The text prompt to ground.
ELASTIC_SEARCH_ENDPOINT: The absolute endpoint path for the Elasticsearch resource to use.
ELASTIC_SEARCH_API_KEY: The API key for the Elasticsearch data endpoint.
INDEX_NAME: The name of the Elasticsearch index used for grounding.
SEARCH_TEMPLATE_NAME: The Elasticsearch search template used for grounding.
NUM_HITS: The number of results returned from the Elasticsearch data source and used for grounding.

Request JSON body:

    {       "contents": [         {           "role": "user",           "parts": [             {               "text": "QUERY"             }           ]         }       ],       "tools": [{         "retrieval": {           "externalApi": {             "api_spec": "ELASTIC_SEARCH",             "endpoint": "ELASTIC_SEARCH_ENDPOINT",             "apiAuth": {               "apiKeyConfig": {                 "apiKeyString": "ApiKey ELASTIC_SEARCH_API_KEY"               }             },             "elasticSearchParams": {               "index": "INDEX_NAME",               "searchTemplate": "SEARCH_TEMPLATE_NAME",               "numHits": "NUM_HITS",             }           }         }       }]     }

For more information on other API fields such as system instructions and multi-turn chats, see Generative AI beginner's guide.

Send the API request

You can save the request body in a file named request.json. Then execute the POST API request, and do the following replacements:

LOCATION: The region to process the request. For more information on available locations, see Generative AI on Vertex AI locations.
PROJECT_ID: Your Google Cloud project ID. For more information on project IDs, see Creating and managing projects.
MODEL_ID: The model ID of the multimodal model.

  curl -X POST \       -H "Authorization: Bearer $(gcloud auth print-access-token)" \       -H "Content-Type: application/json; charset=utf-8" \       -d @request.json \   "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:generateContent"

You should receive a JSON response similar to the following:

  {     "candidates": [       {         "content": {           "role": "model",           "parts": [             {               "text": "Based on the information ..."             }           ]         },         "finishReason": "STOP",         "safetyRatings": [ "..." ],         "groundingMetadata": {           "groundingChunks": [             {               "retrievedContext": {                 "text": "ipsum lorem ..."               }             },             {...},             {...},           ],           "groundingSupports": [             {               "segment": {                 "startIndex": 25,                 "endIndex": 147,                 "text": "ipsum lorem ..."               },               "groundingChunkIndices": [1,2],               "confidenceScores": [0.6626542, 0.82018316],             },           ],         },       }     ],   }

Understand your response

The response from both APIs include the LLM-generated text, which is called a candidate. If your model prompt successfully grounds to your Elasticsearch data source, then the responses include grounding metadata, which identifies the parts of the response that were derived from your Elasticsearch data. However, there are several reasons this metadata might not be provided, and the prompt response won't be grounded. These reasons include low-source relevance or incomplete information within the model's response.

The following is a breakdown of the output data:

Role: Indicates the sender of the grounded answer. Because the response always contains grounded text, the role is always model.
Text: The grounded answer generated by the LLM.
Grounding metadata: Information about the grounding source, which contains the following elements:
- Grounding chunks: A list of results from your Elasticsearch index that support the answer.
- Grounding supports: Information about a specific claim within the answer that can be used to show citations:
- Segment: The part of the model's answer that is substantiated by a grounding chunk.
- Grounding chunk index: The index of the grounding chunks in the grounding chunks list that corresponds to this claim.
- Confidence scores: A number from 0 to 1 that indicates how grounded the claim is in the provided set of grounding chunks. Not available for Gemini 2.5 Pro and Gemini 2.5 Flash and later.

Understand the response

The response from the Vertex AI Studio or the API includes the LLM-generated text, which is called a candidate. If your model prompt successfully grounds to your Elasticsearch data source, the response includes grounding metadata that identifies the parts of the response derived from your data.

However, if there is low source relevance or incomplete information, metadata might not be provided, and the response won't be grounded.

The API response contains the following details:

Role: Indicates the sender of the answer. For a model-generated response, the role is model.
Text: The answer generated by the LLM.
Grounding metadata: Information about the grounding source, which contains the following elements:
- Grounding chunks: A list of results from your Elasticsearch index that support the answer.
- Grounding supports: Information about a specific claim within the answer that can be used to show citations:
  - Segment: The part of the model's answer that is substantiated by a grounding chunk.
  - Grounding chunk index: The index of the grounding chunks in the grounding chunks list that corresponds to this claim.
  - Confidence scores: A number from 0 to 1 that indicates how grounded the claim is in the provided set of grounding chunks. Not available for Gemini 2.5 Pro and Gemini 2.5 Flash and later.

What's next

To learn how to send chat prompt requests, see Multiturn chat.
To learn about responsible AI best practices and Vertex AI's safety filters, see Safety best practices.

Grounding with Elasticsearch Stay organized with collections Save and categorize content based on your preferences.

Overview of grounding with Elasticsearch

Supported models

Set up a search template in Elasticsearch

Best practices

Sample templates

Generate grounded responses with Elasticsearch

Console

Understand your response

REST

Prerequisites

API access

Prepare a grounded generation request

Request JSON body:

Send the API request

Understand your response

Understand the response

What's next

Grounding with Elasticsearch