This page explains what Provisioned Throughput is and when to use Provisioned Throughput. Provisioned Throughput is a fixed-cost, fixed-term subscription available in several term-lengths that reserves throughput for supported generative AI models on Vertex AI. To reserve your throughput, you must specify the model and available locations in which the model runs. If any of the following considerations apply to your use case, consider using Provisioned Throughput: Provisioned Throughput is one of two ways to consume your generative AI models. The second way is pay-as-you-go, which is also referred to as on-demand.Introduction to Provisioned Throughput
When to use Provisioned Throughput
What's next
Provisioned Throughput overview
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-21 UTC.