Puedes pedirle a un modelo de Gemini que genere y edite imágenes con instrucciones que incluyan solo texto o texto e imágenes. Cuando usas Firebase AI Logic, puedes realizar esta solicitud directamente desde tu app.
Con esta capacidad, puedes hacer lo siguiente:
Genera imágenes de forma iterativa a través de conversaciones en lenguaje natural y ajusta las imágenes manteniendo la coherencia y el contexto.
Genera imágenes con renderización de texto de alta calidad, incluidas cadenas de texto largas.
Genera una salida intercalada de texto e imagen. Por ejemplo, una entrada de blog con texto e imágenes en un solo turno. Anteriormente, esto requería encadenar varios modelos.
Generar imágenes con el conocimiento del mundo y las capacidades de razonamiento de Gemini
Más adelante en esta página, encontrarás una lista completa de las modalidades y capacidades compatibles, junto con ejemplos de instrucciones.
Para la salida de imágenes, debes usar el modelo Gemini gemini-2.0-flash-preview-image-generation y, además, incluir responseModalities: ["TEXT", "IMAGE"]
Ir al código de texto a imagen Ir al código de texto e imágenes intercalados
Ir al código para la edición de imágenes Ir al código para la edición iterativa de imágenes
| Consulta otras guías para obtener más opciones para trabajar con imágenes Analiza imágenes Analiza imágenes en el dispositivo Genera resultados estructurados |
Cómo elegir entre los modelos Gemini y Imagen
Los SDKs de Firebase AI Logic admiten la generación de imágenes con un modelo Gemini o un modelo Imagen. Para la mayoría de los casos de uso, comienza con Gemini y, luego, elige Imagen para tareas especializadas en las que la calidad de la imagen es fundamental.
Ten en cuenta que los SDKs de Firebase AI Logic aún no admiten la entrada de imágenes (como para la edición) con los modelos de Imagen. Por lo tanto, si deseas trabajar con imágenes de entrada, puedes usar un modelo Gemini en su lugar.
Elige Gemini cuando quieras hacer lo siguiente:
- Usar el conocimiento del mundo y el razonamiento para generar imágenes pertinentes según el contexto
- Combinar imágenes y texto de forma fluida
- Para incorporar elementos visuales precisos en secuencias de texto largas
- Editar imágenes de forma conversacional y mantener el contexto
Elige Imagen cuando quieras hacer lo siguiente:
- Priorizar la calidad de la imagen, el fotorrealismo, los detalles artísticos o los estilos específicos (por ejemplo, impresionismo o anime)
- Especificar de forma explícita la relación de aspecto o el formato de las imágenes generadas
Antes de comenzar
| Haz clic en tu proveedor de Gemini API para ver el contenido y el código específicos del proveedor en esta página. |
Si aún no lo has hecho, completa la guía de introducción, en la que se describe cómo configurar tu proyecto de Firebase, conectar tu app a Firebase, agregar el SDK, inicializar el servicio de backend para el proveedor de Gemini API que elijas y crear una instancia de GenerativeModel.
Para probar y, luego, iterar tus instrucciones, e incluso obtener un fragmento de código generado, te recomendamos usar Google AI Studio.
Modelos que admiten esta capacidad
La salida de imágenes de Gemini solo es compatible con gemini-2.0-flash-preview-image-generation (no con gemini-2.0-flash).
Ten en cuenta que los SDKs también admiten la generación de imágenes con modelos Imagen.
Generar y editar imágenes
Puedes generar y editar imágenes con un modelo Gemini.
Genera imágenes (entrada de solo texto)
| Antes de probar esta muestra, completa la sección Antes de comenzar de esta guía para configurar tu proyecto y tu app. En esa sección, también harás clic en un botón para el proveedor de Gemini API que elijas, de modo que veas contenido específico del proveedor en esta página. |
Puedes pedirle a un modelo de Gemini que genere imágenes con instrucciones de texto.
Asegúrate de crear una instancia de GenerativeModel, incluir responseModalities: ["TEXT", "IMAGE"]generateContent.
Swift
import FirebaseAI // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel( modelName: "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig: GenerationConfig(responseModalities: [.text, .image]) ) // Provide a text prompt instructing the model to generate an image let prompt = "Generate an image of the Eiffel tower with fireworks in the background." // To generate an image, call `generateContent` with the text input let response = try await model.generateContent(prompt) // Handle the generated image guard let inlineDataPart = response.inlineDataParts.first else { fatalError("No image data in response.") } guard let uiImage = UIImage(data: inlineDataPart.data) else { fatalError("Failed to convert data to UIImage.") } Kotlin
// Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel( modelName = "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig = generationConfig { responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) } ) // Provide a text prompt instructing the model to generate an image val prompt = "Generate an image of the Eiffel tower with fireworks in the background." // To generate image output, call `generateContent` with the text input val generatedImageAsBitmap = model.generateContent(prompt) // Handle the generated image .candidates.first().content.parts.firstNotNullOf { it.asImageOrNull() } Java
// Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel( "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images new GenerationConfig.Builder() .setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE)) .build() ); GenerativeModelFutures model = GenerativeModelFutures.from(ai); // Provide a text prompt instructing the model to generate an image Content prompt = new Content.Builder() .addText("Generate an image of the Eiffel Tower with fireworks in the background.") .build(); // To generate an image, call `generateContent` with the text input ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt); Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() { @Override public void onSuccess(GenerateContentResponse result) { // iterate over all the parts in the first candidate in the result object for (Part part : result.getCandidates().get(0).getContent().getParts()) { if (part instanceof ImagePart) { ImagePart imagePart = (ImagePart) part; // The returned image as a bitmap Bitmap generatedImageAsBitmap = imagePart.getImage(); break; } } } @Override public void onFailure(Throwable t) { t.printStackTrace(); } }, executor); Web
import { initializeApp } from "firebase/app"; import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai"; // TODO(developer) Replace the following with your app's Firebase configuration // See: https://firebase.google.com/docs/web/learn-more#config-object const firebaseConfig = { // ... }; // Initialize FirebaseApp const firebaseApp = initializeApp(firebaseConfig); // Initialize the Gemini Developer API backend service const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() }); // Create a `GenerativeModel` instance with a model that supports your use case const model = getGenerativeModel(ai, { model: "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig: { responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE], }, }); // Provide a text prompt instructing the model to generate an image const prompt = 'Generate an image of the Eiffel Tower with fireworks in the background.'; // To generate an image, call `generateContent` with the text input const result = model.generateContent(prompt); // Handle the generated image try { const inlineDataParts = result.response.inlineDataParts(); if (inlineDataParts?.[0]) { const image = inlineDataParts[0].inlineData; console.log(image.mimeType, image.data); } } catch (err) { console.error('Prompt or candidate was blocked:', err); } Dart
import 'package:firebase_ai/firebase_ai.dart'; import 'package:firebase_core/firebase_core.dart'; import 'firebase_options.dart'; await Firebase.initializeApp( options: DefaultFirebaseOptions.currentPlatform, ); // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output final model = FirebaseAI.googleAI().generativeModel( model: 'gemini-2.0-flash-preview-image-generation', // Configure the model to respond with text and images generationConfig: GenerationConfig(responseModalities: [ResponseModality.text, ResponseModality.image]), ); // Provide a text prompt instructing the model to generate an image final prompt = [Content.text('Generate an image of the Eiffel Tower with fireworks in the background.')]; // To generate an image, call `generateContent` with the text input final response = await model.generateContent(prompt); if (response.inlineDataParts.isNotEmpty) { final imageBytes = response.inlineDataParts[0].bytes; // Process the image } else { // Handle the case where no images were generated print('Error: No images were generated.'); } Unity
using Firebase; using Firebase.AI; // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel( modelName: "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig: new GenerationConfig( responseModalities: new[] { ResponseModality.Text, ResponseModality.Image }) ); // Provide a text prompt instructing the model to generate an image var prompt = "Generate an image of the Eiffel Tower with fireworks in the background."; // To generate an image, call `GenerateContentAsync` with the text input var response = await model.GenerateContentAsync(prompt); var text = response.Text; if (!string.IsNullOrWhiteSpace(text)) { // Do something with the text } // Handle the generated image var imageParts = response.Candidates.First().Content.Parts .OfType<ModelContent.InlineDataPart>() .Where(part => part.MimeType == "image/png"); foreach (var imagePart in imageParts) { // Load the Image into a Unity Texture2D object UnityEngine.Texture2D texture2D = new(2, 2); if (texture2D.LoadImage(imagePart.Data.ToArray())) { // Do something with the image } } Genera imágenes y texto intercalado
| Antes de probar esta muestra, completa la sección Antes de comenzar de esta guía para configurar tu proyecto y tu app. En esa sección, también harás clic en un botón para el proveedor de Gemini API que elijas, de modo que veas contenido específico del proveedor en esta página. |
Puedes pedirle a un modelo Gemini que genere imágenes intercaladas con sus respuestas de texto. Por ejemplo, puedes generar imágenes de cómo se vería cada paso de una receta generada junto con las instrucciones del paso, y no tienes que hacer solicitudes separadas al modelo ni a diferentes modelos.
Asegúrate de crear una instancia de GenerativeModel, incluir responseModalities: ["TEXT", "IMAGE"]generateContent.
Swift
import FirebaseAI // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel( modelName: "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig: GenerationConfig(responseModalities: [.text, .image]) ) // Provide a text prompt instructing the model to generate interleaved text and images let prompt = """ Generate an illustrated recipe for a paella. Create images to go alongside the text as you generate the recipe """ // To generate interleaved text and images, call `generateContent` with the text input let response = try await model.generateContent(prompt) // Handle the generated text and image guard let candidate = response.candidates.first else { fatalError("No candidates in response.") } for part in candidate.content.parts { switch part { case let textPart as TextPart: // Do something with the generated text let text = textPart.text case let inlineDataPart as InlineDataPart: // Do something with the generated image guard let uiImage = UIImage(data: inlineDataPart.data) else { fatalError("Failed to convert data to UIImage.") } default: fatalError("Unsupported part type: \(part)") } } Kotlin
// Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel( modelName = "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig = generationConfig { responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) } ) // Provide a text prompt instructing the model to generate interleaved text and images val prompt = """ Generate an illustrated recipe for a paella. Create images to go alongside the text as you generate the recipe """.trimIndent() // To generate interleaved text and images, call `generateContent` with the text input val responseContent = model.generateContent(prompt).candidates.first().content // The response will contain image and text parts interleaved for (part in responseContent.parts) { when (part) { is ImagePart -> { // ImagePart as a bitmap val generatedImageAsBitmap: Bitmap? = part.asImageOrNull() } is TextPart -> { // Text content from the TextPart val text = part.text } } } Java
// Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel( "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images new GenerationConfig.Builder() .setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE)) .build() ); GenerativeModelFutures model = GenerativeModelFutures.from(ai); // Provide a text prompt instructing the model to generate interleaved text and images Content prompt = new Content.Builder() .addText("Generate an illustrated recipe for a paella.\n" + "Create images to go alongside the text as you generate the recipe") .build(); // To generate interleaved text and images, call `generateContent` with the text input ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt); Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() { @Override public void onSuccess(GenerateContentResponse result) { Content responseContent = result.getCandidates().get(0).getContent(); // The response will contain image and text parts interleaved for (Part part : responseContent.getParts()) { if (part instanceof ImagePart) { // ImagePart as a bitmap Bitmap generatedImageAsBitmap = ((ImagePart) part).getImage(); } else if (part instanceof TextPart){ // Text content from the TextPart String text = ((TextPart) part).getText(); } } } @Override public void onFailure(Throwable t) { System.err.println(t); } }, executor); Web
import { initializeApp } from "firebase/app"; import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai"; // TODO(developer) Replace the following with your app's Firebase configuration // See: https://firebase.google.com/docs/web/learn-more#config-object const firebaseConfig = { // ... }; // Initialize FirebaseApp const firebaseApp = initializeApp(firebaseConfig); // Initialize the Gemini Developer API backend service const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() }); // Create a `GenerativeModel` instance with a model that supports your use case const model = getGenerativeModel(ai, { model: "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig: { responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE], }, }); // Provide a text prompt instructing the model to generate interleaved text and images const prompt = 'Generate an illustrated recipe for a paella.\n.' + 'Create images to go alongside the text as you generate the recipe'; // To generate interleaved text and images, call `generateContent` with the text input const result = await model.generateContent(prompt); // Handle the generated text and image try { const response = result.response; if (response.candidates?.[0].content?.parts) { for (const part of response.candidates?.[0].content?.parts) { if (part.text) { // Do something with the text console.log(part.text) } if (part.inlineData) { // Do something with the image const image = part.inlineData; console.log(image.mimeType, image.data); } } } } catch (err) { console.error('Prompt or candidate was blocked:', err); } Dart
import 'package:firebase_ai/firebase_ai.dart'; import 'package:firebase_core/firebase_core.dart'; import 'firebase_options.dart'; await Firebase.initializeApp( options: DefaultFirebaseOptions.currentPlatform, ); // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output final model = FirebaseAI.googleAI().generativeModel( model: 'gemini-2.0-flash-preview-image-generation', // Configure the model to respond with text and images generationConfig: GenerationConfig(responseModalities: [ResponseModality.text, ResponseModality.image]), ); // Provide a text prompt instructing the model to generate interleaved text and images final prompt = [Content.text( 'Generate an illustrated recipe for a paella\n ' + 'Create images to go alongside the text as you generate the recipe' )]; // To generate interleaved text and images, call `generateContent` with the text input final response = await model.generateContent(prompt); // Handle the generated text and image final parts = response.candidates.firstOrNull?.content.parts if (parts.isNotEmpty) { for (final part in parts) { if (part is TextPart) { // Do something with text part final text = part.text } if (part is InlineDataPart) { // Process image final imageBytes = part.bytes } } } else { // Handle the case where no images were generated print('Error: No images were generated.'); } Unity
using Firebase; using Firebase.AI; // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel( modelName: "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig: new GenerationConfig( responseModalities: new[] { ResponseModality.Text, ResponseModality.Image }) ); // Provide a text prompt instructing the model to generate interleaved text and images var prompt = "Generate an illustrated recipe for a paella \n" + "Create images to go alongside the text as you generate the recipe"; // To generate interleaved text and images, call `GenerateContentAsync` with the text input var response = await model.GenerateContentAsync(prompt); // Handle the generated text and image foreach (var part in response.Candidates.First().Content.Parts) { if (part is ModelContent.TextPart textPart) { if (!string.IsNullOrWhiteSpace(textPart.Text)) { // Do something with the text } } else if (part is ModelContent.InlineDataPart dataPart) { if (dataPart.MimeType == "image/png") { // Load the Image into a Unity Texture2D object UnityEngine.Texture2D texture2D = new(2, 2); if (texture2D.LoadImage(dataPart.Data.ToArray())) { // Do something with the image } } } } Edita imágenes (entrada de texto e imagen)
| Antes de probar esta muestra, completa la sección Antes de comenzar de esta guía para configurar tu proyecto y tu app. En esa sección, también harás clic en un botón para el proveedor de Gemini API que elijas, de modo que veas contenido específico del proveedor en esta página. |
Puedes pedirle a un modelo Gemini que edite imágenes con instrucciones de texto y una o más imágenes.
Asegúrate de crear una instancia de GenerativeModel, incluir responseModalities: ["TEXT", "IMAGE"]generateContent.
Swift
import FirebaseAI // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel( modelName: "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig: GenerationConfig(responseModalities: [.text, .image]) ) // Provide an image for the model to edit guard let image = UIImage(named: "scones") else { fatalError("Image file not found.") } // Provide a text prompt instructing the model to edit the image let prompt = "Edit this image to make it look like a cartoon" // To edit the image, call `generateContent` with the image and text input let response = try await model.generateContent(image, prompt) // Handle the generated image guard let inlineDataPart = response.inlineDataParts.first else { fatalError("No image data in response.") } guard let uiImage = UIImage(data: inlineDataPart.data) else { fatalError("Failed to convert data to UIImage.") } Kotlin
// Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel( modelName = "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig = generationConfig { responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) } ) // Provide an image for the model to edit val bitmap = BitmapFactory.decodeResource(context.resources, R.drawable.scones) // Provide a text prompt instructing the model to edit the image val prompt = content { image(bitmap) text("Edit this image to make it look like a cartoon") } // To edit the image, call `generateContent` with the prompt (image and text input) val generatedImageAsBitmap = model.generateContent(prompt) // Handle the generated text and image .candidates.first().content.parts.firstNotNullOf { it.asImageOrNull() } Java
// Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel( "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images new GenerationConfig.Builder() .setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE)) .build() ); GenerativeModelFutures model = GenerativeModelFutures.from(ai); // Provide an image for the model to edit Bitmap bitmap = BitmapFactory.decodeResource(resources, R.drawable.scones); // Provide a text prompt instructing the model to edit the image Content promptcontent = new Content.Builder() .addImage(bitmap) .addText("Edit this image to make it look like a cartoon") .build(); // To edit the image, call `generateContent` with the prompt (image and text input) ListenableFuture<GenerateContentResponse> response = model.generateContent(promptcontent); Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() { @Override public void onSuccess(GenerateContentResponse result) { // iterate over all the parts in the first candidate in the result object for (Part part : result.getCandidates().get(0).getContent().getParts()) { if (part instanceof ImagePart) { ImagePart imagePart = (ImagePart) part; Bitmap generatedImageAsBitmap = imagePart.getImage(); break; } } } @Override public void onFailure(Throwable t) { t.printStackTrace(); } }, executor); Web
import { initializeApp } from "firebase/app"; import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai"; // TODO(developer) Replace the following with your app's Firebase configuration // See: https://firebase.google.com/docs/web/learn-more#config-object const firebaseConfig = { // ... }; // Initialize FirebaseApp const firebaseApp = initializeApp(firebaseConfig); // Initialize the Gemini Developer API backend service const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() }); // Create a `GenerativeModel` instance with a model that supports your use case const model = getGenerativeModel(ai, { model: "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig: { responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE], }, }); // Prepare an image for the model to edit async function fileToGenerativePart(file) { const base64EncodedDataPromise = new Promise((resolve) => { const reader = new FileReader(); reader.onloadend = () => resolve(reader.result.split(',')[1]); reader.readAsDataURL(file); }); return { inlineData: { data: await base64EncodedDataPromise, mimeType: file.type }, }; } // Provide a text prompt instructing the model to edit the image const prompt = "Edit this image to make it look like a cartoon"; const fileInputEl = document.querySelector("input[type=file]"); const imagePart = await fileToGenerativePart(fileInputEl.files[0]); // To edit the image, call `generateContent` with the image and text input const result = await model.generateContent([prompt, imagePart]); // Handle the generated image try { const inlineDataParts = result.response.inlineDataParts(); if (inlineDataParts?.[0]) { const image = inlineDataParts[0].inlineData; console.log(image.mimeType, image.data); } } catch (err) { console.error('Prompt or candidate was blocked:', err); } Dart
import 'package:firebase_ai/firebase_ai.dart'; import 'package:firebase_core/firebase_core.dart'; import 'firebase_options.dart'; await Firebase.initializeApp( options: DefaultFirebaseOptions.currentPlatform, ); // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output final model = FirebaseAI.googleAI().generativeModel( model: 'gemini-2.0-flash-preview-image-generation', // Configure the model to respond with text and images generationConfig: GenerationConfig(responseModalities: [ResponseModality.text, ResponseModality.image]), ); // Prepare an image for the model to edit final image = await File('scones.jpg').readAsBytes(); final imagePart = InlineDataPart('image/jpeg', image); // Provide a text prompt instructing the model to edit the image final prompt = TextPart("Edit this image to make it look like a cartoon"); // To edit the image, call `generateContent` with the image and text input final response = await model.generateContent([ Content.multi([prompt,imagePart]) ]); // Handle the generated image if (response.inlineDataParts.isNotEmpty) { final imageBytes = response.inlineDataParts[0].bytes; // Process the image } else { // Handle the case where no images were generated print('Error: No images were generated.'); } Unity
using Firebase; using Firebase.AI; // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel( modelName: "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig: new GenerationConfig( responseModalities: new[] { ResponseModality.Text, ResponseModality.Image }) ); // Prepare an image for the model to edit var imageFile = System.IO.File.ReadAllBytes(System.IO.Path.Combine( UnityEngine.Application.streamingAssetsPath, "scones.jpg")); var image = ModelContent.InlineData("image/jpeg", imageFile); // Provide a text prompt instructing the model to edit the image var prompt = ModelContent.Text("Edit this image to make it look like a cartoon."); // To edit the image, call `GenerateContent` with the image and text input var response = await model.GenerateContentAsync(new [] { prompt, image }); var text = response.Text; if (!string.IsNullOrWhiteSpace(text)) { // Do something with the text } // Handle the generated image var imageParts = response.Candidates.First().Content.Parts .OfType<ModelContent.InlineDataPart>() .Where(part => part.MimeType == "image/png"); foreach (var imagePart in imageParts) { // Load the Image into a Unity Texture2D object Texture2D texture2D = new Texture2D(2, 2); if (texture2D.LoadImage(imagePart.Data.ToArray())) { // Do something with the image } } Itera y edita imágenes con el chat de varios turnos
| Antes de probar esta muestra, completa la sección Antes de comenzar de esta guía para configurar tu proyecto y tu app. En esa sección, también harás clic en un botón para el proveedor de Gemini API que elijas, de modo que veas contenido específico del proveedor en esta página. |
Con el chat de varios turnos, puedes iterar con un modelo Gemini en las imágenes que genera o que proporcionas.
Asegúrate de crear una instancia de GenerativeModel, incluir responseModalities: ["TEXT", "IMAGE"]startChat() y sendMessage() para enviar mensajes de usuarios nuevos.
Swift
import FirebaseAI // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output let generativeModel = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel( modelName: "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig: GenerationConfig(responseModalities: [.text, .image]) ) // Initialize the chat let chat = model.startChat() guard let image = UIImage(named: "scones") else { fatalError("Image file not found.") } // Provide an initial text prompt instructing the model to edit the image let prompt = "Edit this image to make it look like a cartoon" // To generate an initial response, send a user message with the image and text prompt let response = try await chat.sendMessage(image, prompt) // Inspect the generated image guard let inlineDataPart = response.inlineDataParts.first else { fatalError("No image data in response.") } guard let uiImage = UIImage(data: inlineDataPart.data) else { fatalError("Failed to convert data to UIImage.") } // Follow up requests do not need to specify the image again let followUpResponse = try await chat.sendMessage("But make it old-school line drawing style") // Inspect the edited image after the follow up request guard let followUpInlineDataPart = followUpResponse.inlineDataParts.first else { fatalError("No image data in response.") } guard let followUpUIImage = UIImage(data: followUpInlineDataPart.data) else { fatalError("Failed to convert data to UIImage.") } Kotlin
// Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel( modelName = "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig = generationConfig { responseModalities = listOf(ResponseModality.TEXT, ResponseModality.IMAGE) } ) // Provide an image for the model to edit val bitmap = BitmapFactory.decodeResource(context.resources, R.drawable.scones) // Create the initial prompt instructing the model to edit the image val prompt = content { image(bitmap) text("Edit this image to make it look like a cartoon") } // Initialize the chat val chat = model.startChat() // To generate an initial response, send a user message with the image and text prompt var response = chat.sendMessage(prompt) // Inspect the returned image var generatedImageAsBitmap = response .candidates.first().content.parts.firstNotNullOf { it.asImageOrNull() } // Follow up requests do not need to specify the image again response = chat.sendMessage("But make it old-school line drawing style") generatedImageAsBitmap = response .candidates.first().content.parts.firstNotNullOf { it.asImageOrNull() } Java
// Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel( "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images new GenerationConfig.Builder() .setResponseModalities(Arrays.asList(ResponseModality.TEXT, ResponseModality.IMAGE)) .build() ); GenerativeModelFutures model = GenerativeModelFutures.from(ai); // Provide an image for the model to edit Bitmap bitmap = BitmapFactory.decodeResource(resources, R.drawable.scones); // Initialize the chat ChatFutures chat = model.startChat(); // Create the initial prompt instructing the model to edit the image Content prompt = new Content.Builder() .setRole("user") .addImage(bitmap) .addText("Edit this image to make it look like a cartoon") .build(); // To generate an initial response, send a user message with the image and text prompt ListenableFuture<GenerateContentResponse> response = chat.sendMessage(prompt); // Extract the image from the initial response ListenableFuture<@Nullable Bitmap> initialRequest = Futures.transform(response, result -> { for (Part part : result.getCandidates().get(0).getContent().getParts()) { if (part instanceof ImagePart) { ImagePart imagePart = (ImagePart) part; return imagePart.getImage(); } } return null; }, executor); // Follow up requests do not need to specify the image again ListenableFuture<GenerateContentResponse> modelResponseFuture = Futures.transformAsync( initialRequest, generatedImage -> { Content followUpPrompt = new Content.Builder() .addText("But make it old-school line drawing style") .build(); return chat.sendMessage(followUpPrompt); }, executor); // Add a final callback to check the reworked image Futures.addCallback(modelResponseFuture, new FutureCallback<GenerateContentResponse>() { @Override public void onSuccess(GenerateContentResponse result) { for (Part part : result.getCandidates().get(0).getContent().getParts()) { if (part instanceof ImagePart) { ImagePart imagePart = (ImagePart) part; Bitmap generatedImageAsBitmap = imagePart.getImage(); break; } } } @Override public void onFailure(Throwable t) { t.printStackTrace(); } }, executor); Web
import { initializeApp } from "firebase/app"; import { getAI, getGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai"; // TODO(developer) Replace the following with your app's Firebase configuration // See: https://firebase.google.com/docs/web/learn-more#config-object const firebaseConfig = { // ... }; // Initialize FirebaseApp const firebaseApp = initializeApp(firebaseConfig); // Initialize the Gemini Developer API backend service const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() }); // Create a `GenerativeModel` instance with a model that supports your use case const model = getGenerativeModel(ai, { model: "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig: { responseModalities: [ResponseModality.TEXT, ResponseModality.IMAGE], }, }); // Prepare an image for the model to edit async function fileToGenerativePart(file) { const base64EncodedDataPromise = new Promise((resolve) => { const reader = new FileReader(); reader.onloadend = () => resolve(reader.result.split(',')[1]); reader.readAsDataURL(file); }); return { inlineData: { data: await base64EncodedDataPromise, mimeType: file.type }, }; } const fileInputEl = document.querySelector("input[type=file]"); const imagePart = await fileToGenerativePart(fileInputEl.files[0]); // Provide an initial text prompt instructing the model to edit the image const prompt = "Edit this image to make it look like a cartoon"; // Initialize the chat const chat = model.startChat(); // To generate an initial response, send a user message with the image and text prompt const result = await chat.sendMessage([prompt, imagePart]); // Request and inspect the generated image try { const inlineDataParts = result.response.inlineDataParts(); if (inlineDataParts?.[0]) { // Inspect the generated image const image = inlineDataParts[0].inlineData; console.log(image.mimeType, image.data); } } catch (err) { console.error('Prompt or candidate was blocked:', err); } // Follow up requests do not need to specify the image again const followUpResult = await chat.sendMessage("But make it old-school line drawing style"); // Request and inspect the returned image try { const followUpInlineDataParts = followUpResult.response.inlineDataParts(); if (followUpInlineDataParts?.[0]) { // Inspect the generated image const followUpImage = followUpInlineDataParts[0].inlineData; console.log(followUpImage.mimeType, followUpImage.data); } } catch (err) { console.error('Prompt or candidate was blocked:', err); } Dart
import 'package:firebase_ai/firebase_ai.dart'; import 'package:firebase_core/firebase_core.dart'; import 'firebase_options.dart'; await Firebase.initializeApp( options: DefaultFirebaseOptions.currentPlatform, ); // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output final model = FirebaseAI.googleAI().generativeModel( model: 'gemini-2.0-flash-preview-image-generation', // Configure the model to respond with text and images generationConfig: GenerationConfig(responseModalities: [ResponseModality.text, ResponseModality.image]), ); // Prepare an image for the model to edit final image = await File('scones.jpg').readAsBytes(); final imagePart = InlineDataPart('image/jpeg', image); // Provide an initial text prompt instructing the model to edit the image final prompt = TextPart("Edit this image to make it look like a cartoon"); // Initialize the chat final chat = model.startChat(); // To generate an initial response, send a user message with the image and text prompt final response = await chat.sendMessage([ Content.multi([prompt,imagePart]) ]); // Inspect the returned image if (response.inlineDataParts.isNotEmpty) { final imageBytes = response.inlineDataParts[0].bytes; // Process the image } else { // Handle the case where no images were generated print('Error: No images were generated.'); } // Follow up requests do not need to specify the image again final followUpResponse = await chat.sendMessage([ Content.text("But make it old-school line drawing style") ]); // Inspect the returned image if (followUpResponse.inlineDataParts.isNotEmpty) { final followUpImageBytes = response.inlineDataParts[0].bytes; // Process the image } else { // Handle the case where no images were generated print('Error: No images were generated.'); } Unity
using Firebase; using Firebase.AI; // Initialize the Gemini Developer API backend service // Create a `GenerativeModel` instance with a Gemini model that supports image output var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetGenerativeModel( modelName: "gemini-2.0-flash-preview-image-generation", // Configure the model to respond with text and images generationConfig: new GenerationConfig( responseModalities: new[] { ResponseModality.Text, ResponseModality.Image }) ); // Prepare an image for the model to edit var imageFile = System.IO.File.ReadAllBytes(System.IO.Path.Combine( UnityEngine.Application.streamingAssetsPath, "scones.jpg")); var image = ModelContent.InlineData("image/jpeg", imageFile); // Provide an initial text prompt instructing the model to edit the image var prompt = ModelContent.Text("Edit this image to make it look like a cartoon."); // Initialize the chat var chat = model.StartChat(); // To generate an initial response, send a user message with the image and text prompt var response = await chat.SendMessageAsync(new [] { prompt, image }); // Inspect the returned image var imageParts = response.Candidates.First().Content.Parts .OfType<ModelContent.InlineDataPart>() .Where(part => part.MimeType == "image/png"); // Load the image into a Unity Texture2D object UnityEngine.Texture2D texture2D = new(2, 2); if (texture2D.LoadImage(imageParts.First().Data.ToArray())) { // Do something with the image } // Follow up requests do not need to specify the image again var followUpResponse = await chat.SendMessageAsync("But make it old-school line drawing style"); // Inspect the returned image var followUpImageParts = followUpResponse.Candidates.First().Content.Parts .OfType<ModelContent.InlineDataPart>() .Where(part => part.MimeType == "image/png"); // Load the image into a Unity Texture2D object UnityEngine.Texture2D followUpTexture2D = new(2, 2); if (followUpTexture2D.LoadImage(followUpImageParts.First().Data.ToArray())) { // Do something with the image } Funciones compatibles, limitaciones y prácticas recomendadas
Modalidades y capacidades compatibles
A continuación, se indican las modalidades y capacidades compatibles para la salida de imágenes de un modelo Gemini. Cada capacidad muestra un ejemplo de instrucción y tiene un ejemplo de código arriba.
Texto a imagen (solo texto a imagen)
- Genera una imagen de la Torre Eiffel con fuegos artificiales de fondo.
Texto a imagen (renderización de texto)
- Genera una foto cinemática de un edificio grande con esta proyección de texto gigante sobre la fachada.
Texto a imagen(o imágenes) y texto (intercalado)
Genera una receta ilustrada para hacer paella. Crea imágenes junto con el texto a medida que generas la receta.
Genera una historia sobre un perro en un estilo de animación de dibujos animados en 3D. Para cada escena, genera una imagen.
Imágenes y texto a imágenes y texto (intercalados)
- [Imagen de una habitación amueblada] + ¿Qué otros colores de sofás combinarían con mi espacio? ¿Puedes actualizar la imagen?
Edición de imágenes (texto e imagen a imagen)
[Imagen de scones] + Edita esta imagen para que parezca una caricatura
[Imagen de un gato] + [Imagen de una almohada] + Crea un punto de cruz de mi gato en esta almohada.
Edición de imágenes en varios turnos (chat)
- [Imagen de un automóvil azul] + Convierte este automóvil en un convertible y, luego, Ahora cambia el color a amarillo.
Limitaciones y recomendaciones
A continuación, se indican las limitaciones y las prácticas recomendadas para la salida de imágenes de un modelo Gemini.
En esta versión experimental pública, Gemini admite lo siguiente:
- Genera imágenes PNG con una dimensión máxima de 1,024 px.
- Generar y editar imágenes de personas
- Usamos filtros de seguridad que brindan una experiencia del usuario flexible y menos restrictiva.
Para obtener el mejor rendimiento, usa los siguientes idiomas:
en,es-mx,ja-jp,zh-cnyhi-in.La generación de imágenes no admite entradas de audio o video.
Es posible que la generación de imágenes no siempre se active. Estos son algunos problemas conocidos:
El modelo solo puede generar texto.
Intenta pedir resultados de imágenes de forma explícita (por ejemplo, "genera una imagen", "proporciona imágenes a medida que avanzas", "actualiza la imagen").Es posible que el modelo deje de generar contenido a mitad de la respuesta.
Vuelve a intentarlo o prueba con otra instrucción.El modelo puede generar texto como una imagen.
Intenta pedir resultados de texto de forma explícita. Por ejemplo, "genera texto narrativo junto con ilustraciones".
Cuando generes texto para una imagen, Gemini funcionará mejor si primero generas el texto y, luego, solicitas una imagen con el texto.