
Apple has released Pico-Banana-400K, a research dataset containing 400,000 images. Interestingly, this dataset was built using Google's Gemini-2.5 model.
Apple's research, titled "Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing," includes the complete dataset of 400,000 images. This dataset is released under a non-commercial research license, meaning researchers and academic institutions are free to use it, but not for commercial purposes.
Several months ago, Google launched the Gemini 2.5-Flash-Image model, also known as Nanon-Banana, which performs exceptionally well in image editing tasks and is widely considered one of the most advanced image editing models available. Despite significant advancements in image generation and editing models in recent years, Apple's research team points out that "despite continuous technological progress, open research remains hampered by a lack of large-scale, high-quality, and fully shareable image editing datasets. Existing datasets often rely on synthetic data generated by proprietary models or contain only limited, manually selected subsets. Furthermore, these datasets commonly suffer from domain shifts, uneven distribution of editing types, and inconsistent quality control, severely hindering the development of robust image editing models."
To address this bottleneck, the Apple team set out to build a more comprehensive and representative image editing dataset.
The research team first selected a large number of real-world photographs from the OpenImages dataset, ensuring diverse content including people, objects, and scenes containing text.
Apple released the Pico-Banana-400K dataset, containing 400,000 images, to help train AI image editing models.
The team then designed 35 different types of image modification instructions, categorizing them into eight main groups:
Pixel & Photometric Adjustments: such as adding film grain or retro filters;
Human-Centric Editing: such as transforming a person into a Funko-Pop style toy;
Scene Composition & Multi-Subject Editing: such as changing weather conditions (sunny/rainy/snowy);
Object-Level Semantic Editing: such as moving objects or adjusting spatial relationships;
Image Scaling: such as zooming in.
Next, researchers input an original image along with an editing instruction into the Nanon-Banana model for image editing. The generated result is then automatically evaluated by the Gemini 2.5-Pro model to determine if it accurately follows the instructions and possesses good visual quality. Only results that pass double validation will be included in the final dataset.
Apple releases the Pico-Banana-400K dataset: containing 400,000 images to help train AI image editing models.
Pico-Banana-400K includes not only single-turn edits (i.e., edits completed with a single prompt), but also multi-turn edit sequences, and "preference pairs"—comparative samples of successful and unsuccessful edits—to help the model learn to distinguish between ideal and poor output.
While the research team acknowledges that Nanon-Banana still has limitations in fine-grained spatial control, layout extrapolation, and text typography, they emphasize that Pico-Banana-400K aims to provide a solid and reproducible foundation for training and evaluation of next-generation text-guided image editing models.
Currently, the related research paper has been published on the preprint platform arXiv, and the complete Pico-Banana-400K dataset is also freely available to researchers worldwide on GitHub.