使用 Gemini 生成圖像

Gemini 可以生成及處理圖片，並以對話方式提供相關資訊。你可以使用文字、圖片或兩者組合提示 Gemini，完成各種圖片相關工作，例如生成和編輯圖片。所有生成的圖像都會加上 SynthID 浮水印。

圖片生成功能可能不適用於所有地區和國家/地區，詳情請參閱「Gemini 模型」頁面。

圖像生成 (文字轉圖像)

以下程式碼示範如何根據描述性提示生成圖片。您必須在設定中加入 responseModalities：["TEXT", "IMAGE"]。這些模型不支援僅輸出圖片。

Python

from google import genai from google.genai import types from PIL import Image from io import BytesIO import base64  client = genai.Client()  contents = ('Hi, can you create a 3d rendered image of a pig '             'with wings and a top hat flying over a happy '             'futuristic scifi city with lots of greenery?')  response = client.models.generate_content(     model="gemini-2.0-flash-preview-image-generation",     contents=contents,     config=types.GenerateContentConfig(       response_modalities=['TEXT', 'IMAGE']     ) )  for part in response.candidates[0].content.parts:   if part.text is not None:     print(part.text)   elif part.inline_data is not None:     image = Image.open(BytesIO((part.inline_data.data)))     image.save('gemini-native-image.png')     image.show()

JavaScript

import { GoogleGenAI, Modality } from "@google/genai"; import * as fs from "node:fs";  async function main() {    const ai = new GoogleGenAI({});    const contents =     "Hi, can you create a 3d rendered image of a pig " +     "with wings and a top hat flying over a happy " +     "futuristic scifi city with lots of greenery?";    // Set responseModalities to include "Image" so the model can generate  an image   const response = await ai.models.generateContent({     model: "gemini-2.0-flash-preview-image-generation",     contents: contents,     config: {       responseModalities: [Modality.TEXT, Modality.IMAGE],     },   });   for (const part of response.candidates[0].content.parts) {     // Based on the part type, either show the text or save the image     if (part.text) {       console.log(part.text);     } else if (part.inlineData) {       const imageData = part.inlineData.data;       const buffer = Buffer.from(imageData, "base64");       fs.writeFileSync("gemini-native-image.png", buffer);       console.log("Image saved as gemini-native-image.png");     }   } }  main();

Go

package main  import (   "context"   "fmt"   "os"   "google.golang.org/genai" )  func main() {    ctx := context.Background()   client, err := genai.NewClient(ctx, nil)   if err != nil {       log.Fatal(err)   }    config := &genai.GenerateContentConfig{       ResponseModalities: []string{"TEXT", "IMAGE"},   }    result, _ := client.Models.GenerateContent(       ctx,       "gemini-2.0-flash-preview-image-generation",       genai.Text("Hi, can you create a 3d rendered image of a pig " +                  "with wings and a top hat flying over a happy " +                  "futuristic scifi city with lots of greenery?"),       config,   )    for _, part := range result.Candidates[0].Content.Parts {       if part.Text != "" {           fmt.Println(part.Text)       } else if part.InlineData != nil {           imageBytes := part.InlineData.Data           outputFilename := "gemini_generated_image.png"           _ = os.WriteFile(outputFilename, imageBytes, 0644)       }   } }

REST

curl -s -X POST   "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \   -H "x-goog-api-key: $GEMINI_API_KEY" \   -H "Content-Type: application/json" \   -d '{     "contents": [{       "parts": [         {"text": "Hi, can you create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?"}       ]     }],     "generationConfig":{"responseModalities":["TEXT","IMAGE"]}   }' \   | grep -o '"data": "[^"]*"' \   | cut -d'"' -f4 \   | base64 --decode > gemini-native-image.png

圖像編輯 (文字和圖像轉圖像)

如要編輯圖片，請先加入圖片做為輸入內容。下列範例示範如何上傳 Base64 編碼的圖片。如要處理多張圖片和較大的酬載，請參閱「圖片輸入」一節。

Python

from google import genai from google.genai import types from PIL import Image from io import BytesIO  import PIL.Image  image = PIL.Image.open('/path/to/image.png')  client = genai.Client()  text_input = ('Hi, This is a picture of me.'             'Can you add a llama next to me?',)  response = client.models.generate_content(     model="gemini-2.0-flash-preview-image-generation",     contents=[text_input, image],     config=types.GenerateContentConfig(       response_modalities=['TEXT', 'IMAGE']     ) )  for part in response.candidates[0].content.parts:   if part.text is not None:     print(part.text)   elif part.inline_data is not None:     image = Image.open(BytesIO((part.inline_data.data)))     image.show()

JavaScript

import { GoogleGenAI, Modality } from "@google/genai"; import * as fs from "node:fs";  async function main() {    const ai = new GoogleGenAI({});    // Load the image from the local file system   const imagePath = "path/to/image.png";   const imageData = fs.readFileSync(imagePath);   const base64Image = imageData.toString("base64");    // Prepare the content parts   const contents = [     { text: "Can you add a llama next to the image?" },     {       inlineData: {         mimeType: "image/png",         data: base64Image,       },     },   ];    // Set responseModalities to include "Image" so the model can generate an image   const response = await ai.models.generateContent({     model: "gemini-2.0-flash-preview-image-generation",     contents: contents,     config: {       responseModalities: [Modality.TEXT, Modality.IMAGE],     },   });   for (const part of response.candidates[0].content.parts) {     // Based on the part type, either show the text or save the image     if (part.text) {       console.log(part.text);     } else if (part.inlineData) {       const imageData = part.inlineData.data;       const buffer = Buffer.from(imageData, "base64");       fs.writeFileSync("gemini-native-image.png", buffer);       console.log("Image saved as gemini-native-image.png");     }   } }  main();

Go

package main  import (  "context"  "fmt"  "os"  "google.golang.org/genai" )  func main() {   ctx := context.Background()  client, err := genai.NewClient(ctx, nil)  if err != nil {      log.Fatal(err)  }   imagePath := "/path/to/image.png"  imgData, _ := os.ReadFile(imagePath)   parts := []*genai.Part{    genai.NewPartFromText("Hi, This is a picture of me. Can you add a llama next to me?"),    &genai.Part{      InlineData: &genai.Blob{        MIMEType: "image/png",        Data:     imgData,      },    },  }   contents := []*genai.Content{    genai.NewContentFromParts(parts, genai.RoleUser),  }   config := &genai.GenerateContentConfig{      ResponseModalities: []string{"TEXT", "IMAGE"},  }   result, _ := client.Models.GenerateContent(      ctx,      "gemini-2.0-flash-preview-image-generation",      contents,      config,  )   for _, part := range result.Candidates[0].Content.Parts {      if part.Text != "" {          fmt.Println(part.Text)      } else if part.InlineData != nil {          imageBytes := part.InlineData.Data          outputFilename := "gemini_generated_image.png"          _ = os.WriteFile(outputFilename, imageBytes, 0644)      }  } }

REST

IMG_PATH=/path/to/your/image1.jpeg  if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then   B64FLAGS="--input" else   B64FLAGS="-w0" fi  IMG_BASE64=$(base64 "$B64FLAGS" "$IMG_PATH" 2>&1)  curl -X POST \   "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \     -H "x-goog-api-key: $GEMINI_API_KEY" \     -H 'Content-Type: application/json' \     -d "{       \"contents\": [{         \"parts\":[             {\"text\": \"'Hi, This is a picture of me. Can you add a llama next to me\"},             {               \"inline_data\": {                 \"mime_type\":\"image/jpeg\",                 \"data\": \"$IMG_BASE64\"               }             }         ]       }],       \"generationConfig\": {\"responseModalities\": [\"TEXT\", \"IMAGE\"]}     }"  \   | grep -o '"data": "[^"]*"' \   | cut -d'"' -f4 \   | base64 --decode > gemini-edited-image.png

其他圖片生成模式

Gemini 支援其他圖像互動模式，包括：

文字生成圖片和文字 (交錯)：輸出圖片和相關文字。
- 提示範例：「Generate an illustrated recipe for a paella.」(生成西班牙海鮮飯的插圖食譜)。
圖片和文字轉圖片和文字 (交錯)：使用輸入的圖片和文字，建立新的相關圖片和文字。
- 提示範例：(附上附家具的房間圖片)「我的空間適合什麼顏色的沙發？可以更新圖片嗎？」
多輪圖像編輯 (對話)：以對話方式持續生成 / 編輯圖像。
- 提示範例：[上傳藍色汽車的圖片。] 「將這輛車變成敞篷車。」「Now change the color to yellow.」(現在將顏色改為黃色)。

限制

為獲得最佳成效，請使用下列語言：英文、西班牙文 (墨西哥)、日文、中文 (中國)、印地文。
圖像生成功能不支援音訊或影片輸入內容。
系統不一定會觸發圖像生成功能：
- 模型可能只會輸出文字。請嘗試明確要求生成圖片 (例如「生成圖片」、「在過程中提供圖片」、「更新圖片」)。
- 模型可能會中途停止生成內容。請再試一次或改用其他提示。
為圖片生成文字時，建議先生成文字，然後要求 Gemini 根據文字生成圖片。
圖像生成功能不適用於部分國家/地區。詳情請參閱「模型」。

Imagen 的適用時機

除了使用 Gemini 內建的圖像生成功能，你也可以透過 Gemini API 存取專門的圖像生成模型 Imagen。

在下列情況下選擇「Gemini」Gemini：

您需要運用世界知識和推理能力，生成與情境相關的圖片。
文字和圖片的融合程度很重要。
您希望在長篇文字序列中嵌入準確的圖像。
您想在對話中編輯圖片，同時保留情境。

在下列情況下選擇 Imagen：

圖片品質、相片擬真度、藝術細節或特定風格 (例如印象派、動漫) 是首要考量。
執行專業編輯工作，例如更新產品背景或放大圖片。
融入品牌、風格，或生成標誌和產品設計。

Imagen 4 是生成圖像的首選模型。如要處理進階用途或需要最佳圖片品質，請選擇 Imagen 4 Ultra。請注意，Imagen 4 Ultra 一次只能生成一張圖片。

後續步驟

請參閱 Veo 指南，瞭解如何使用 Gemini API 生成影片。
如要進一步瞭解 Gemini 模型，請參閱「Gemini 模型」和「實驗模型」。