Last updated: June 25, 2025 If your application recently started showing errors related to an unavailable Palm2, Gemini 1.0, or Gemini 1.5-001 models, this document covers how you can transition to a supported model. Google regularly releases new and improved AI models. To make way for these advancements, older models are retired (or deprecated). We provide notice when deprecating a model and a transition window before access to the model is terminated, but we understand it can still cause interruptions. Here are two options for updating your model: The Gemini 2 models feature the following upgrades over our 1.5 models: The following table shows the comparison between our Gemini 2 models: To see all benchmark capabilities for Gemini 2, visit the Google DeepMind documentation. Migrating to Google Cloud's Vertex AI platform offers a suite of MLOps tools that streamline the usage, deployment, and monitoring of AI models for efficiency and reliability. To migrate your work to Vertex AI, import and upload your existing data to Vertex AI Studio and use the Gemini API with Vertex AI. For more information, see Migrate from Gemini on Google AI to Vertex AI. While the experimental version of Gemini 2.0 Flash supports image generation, Gemini 2 does not currently support image generation in our generally available models. The experimental version of Gemini 2.0 Flash shouldn't be used in production-level code. If you need image generation in production code, use Imagen 3. This powerful model offers high-quality images, low-latency generation, and flexible editing options. Compositional function calling is only available in Google AI Studio. For the full list of locations that are supported for Gemini 2 models, see Locations. Gemini 2.0 Flash and Gemini 2.0 Flash-Lite use dynamic shared quota and have no default quota. Gemini 2.5 Pro is an experimental model and has a 10 queries per minute (QPM) limit. For Gemini models on Vertex, we use a Dynamic Shared Quota (DSQ) system. This innovative approach automatically manages capacity across all users in a region, ensuring optimal performance without the need for manual quota adjustments or requests. As a result, you won't see traditional quota usage displayed in the Quotas & System Limits tab. Your project will automatically receive the necessary resources based on real-time availability. Use the Vertex AI Model Garden (Monitoring) dashboard to monitor usage. For generative AI applications in production requiring consistent throughput, we recommend using Provisioned Throughput (PT). PT ensures a predictable and consistent user experience, critical for time-sensitive workloads. Additionally, it provides deterministic monthly or weekly cost structures, enabling accurate budget planning. For more information, see Provisioned Throughput overview. The list of models supported for Provisioned Throughput, including throughput, purchase increment, and burndown rate is listed on our Supported models page. To purchase Provisioned Throughput for partner models (such as Anthropic's Claude models), you must contact Google; you can't order through the Google Cloud console. For more information, see Partner models. There are three ways to measure your Provisioned Throughput usage: When using the built-in monitoring metrics or HTTP response headers, you can create a chart in the Metrics Explorer to monitor usage. To buy and manage Provisioned Throughput, follow the instructions in the Permissions section of Purchase Provisioned Throughput. The same permissions for pay-as-you-go apply for Provisioned Throughput usage. If you still run into issues placing an order, you likely need to add one of the following roles: A generative AI scale unit (GSU) is an abstract measure of capacity for throughput provisioning that is fixed and standard across all Google models that support Provisioned Throughput. A GSU's price and capacity is fixed, but throughput may vary between models because different models may require different amounts of capacity to deliver the same throughput. You can estimate your Provisioned Throughput needs by: You're invoiced for any charges you incur for Provisioned Throughput usage over the course of the month at the end of that month. While a direct test environment is not available, a 1-week order with a limited number of GSUs provides a cost-effective way to experience its benefits and assess its suitability for your requirements. For more information, see Purchase Provisioned Throughput. Gemini 2 general FAQ
Help! The model I'm using isn't available anymore!
How do the Gemini 2 models compare to the 1.5 generation?
Model name Description Upgrade path for Gemini 2.5 Pro Strongest model quality (especially for code and world knowledge), with a 1M token-long context window Gemini 1.5 Pro users who want better quality, or who are particularly invested in long context and code Gemini 2.0 Flash Workhorse model for all daily tasks and features enhanced performance and supports real-time Live API
Gemini 2.0 Flash-Lite Our most cost effective offering to support high throughput
How do I migrate Gemini on Google AI Studio to Vertex AI Studio?
How does Gemini 2 image generation compare to Imagen 3?
Does Gemini 2 in Vertex AI support compositional function calling?
What locations are supported for Gemini 2?
What are the default quotas for Gemini 2?
Monitoring
Why does my quota usage show as 0% percent on API dashboard when I'm sending requests?
Provisioned Throughput
When should I use Provisioned Throughput?
What models are supported for Provisioned Throughput?
How can I monitor my Provisioned Throughput usage?
What permissions are required to purchase and use Provisioned Throughput?
What is a GSU?
How can I estimate my GSU needs for Provisioned Throughput?
How often am I billed for Provisioned Throughput?
How long does it take to activate my Provisioned Throughput order?
Can I test Provisioned Throughput before placing an order?
Frequently asked questions Stay organized with collections Save and categorize content based on your preferences.
$$ \begin{aligned} \text{Throughput per sec} = & \\ & \qquad (\text{Inputs per query converted to input chars} \\ & \qquad + \text{Outputs per query converted to input chars}) \\ & \qquad \times \text{QPS} \end{aligned} $$