
ChatGPT vient de lancer GPT Image 2.0 (et c’est WOW…)
AI Summary
This video introduces and tests Image 2.0, a new image generation model from ChatGPT, positioning it as a significant advancement in AI image creation, potentially surpassing existing leading models like Nano Banana 2. The core of the video is a direct comparison between Image 2.0 and Nano Banana 2 across various image formats and themes, using prompts provided by OpenAI.
The presenter begins by highlighting the rapid pace of AI development, emphasizing the difficulty in keeping up with new tools and identifying genuine business opportunities. To address this, they mention their newsletter, WolfPM, designed to provide daily insights into the AI world.
A key piece of evidence presented is a benchmark from Arena, where users vote on AI models. Image 2.0 scores significantly higher than Gemini 3.1 (Nano Banana 2), with a difference of over 250 points, suggesting a substantial generational leap.
The announcement of Image 2.0 is described as emphasizing its capability to generate complex visuals, produce immediately usable images with greater clarity, richness, and intelligence, and possess a "thinking mode" or reasoning system that analyzes and iterates on image generation. This advanced reasoning is presented as a key differentiator. Image 2.0 is also noted for its precision in object placement and text rendering, along with the ability to generate images in various formats, offering more control and precision, including handling handwritten text and complex designs. The model's ability to generate deep fakes is also mentioned.
Specific features of Image 2.0 highlighted include:
* **Detailed Instructions:** Ability to follow detailed instructions, preserve elements, and restitue them accurately.
* **Sophistication and Photorealism:** Capable of generating high-quality, photorealistic images with stylistic options like color grading, shooting styles, and cartoon or retro aesthetics.
* **Format Flexibility:** Supports various image aspect ratios (16:9, 1:3) and can generate files ready for adaptation, unlike previous models that often produced a single, fixed format.
* **Reasoning Capability:** Possesses a "thinking" or reasoning mode, allowing it to analyze and reflect before generating an image.
* **Web Search and QR Codes:** Can search the web for real-time information and generate QR codes.
* **Knowledge Cut-off:** Knowledge is updated to December.
* **Expert Task Generation:** Can assist with tasks ranging from advertising to design composition.
The presenter notes that Image 2.0 is available to all ChatGPT and Codex users, with advanced features like reasoning available to GPT Pro and Business users.
The testing phase begins with a screenshot prompt. The prompt asks for a ChatGPT screenshot on macOS, showing a dog drawn in ASC2 format within a cluttered window environment, including a terminal and other open applications. Both models are given the prompt, and the results are compared.
In the screenshot test, both models produce a visual that aligns with the prompt. The presenter observes that Nano Banana 2's output has slightly less quality and pixelation. The ASC2 format, which involves generating drawings using HTML, is noted. While both seem to replicate the cluttered macOS desktop reasonably well, the presenter perceives a slight quality difference in Nano Banana 2's rendering.
The second test involves a detailed prompt for a mound of rice, where one grain must have the inscription "GPT-image 2" engraved on it, be the same size as other grains, and be imperceptible at first glance. The presenter initially sets up the test incorrectly, comparing Nano Banana 2's "fast" mode with Image 2.0's "reasoning" mode. After correcting this to compare both in "reasoning" mode, they analyze the results. Image 2.0 is praised for its detail, but the presenter notes that Nano Banana 2's image is good, though it can become pixelated when zoomed. The key difference identified is that Image 2.0 can maintain detail better when zoomed, while Nano Banana 2 tends to pixelate.
A third test focuses on a cinematic, gritty realism prompt, asking for a photorealistic image of twins on a deserted, foggy American road, taken with an analog medium format camera using 85mm film. The prompt specifies a 3K image format. This is where a significant difference emerges. The presenter states that the output from Image 2.0 is clearly superior, with Nano Banana 2's output showing more obvious AI artifacts and lacking the photorealistic quality. The presenter remarks that Image 2.0 generates images that don't look like AI-generated images, while Nano Banana 2's output still carries a distinct AI feel.
The presenter then discusses their personal experience using Nano Banana 2 for YouTube thumbnails, which previously required extensive prompting and iteration. They explain how they would need to meticulously describe every detail – placement, effects, depth of field, fonts – to achieve a desired result, often by analyzing and replicating existing thumbnails. This process was time-consuming, even with paying for professional design. Image 2.0, however, is described as understanding the objective and producing exceptional results with less detailed prompting, akin to a photo. The presenter highlights the exceptional quality, detail, and realism of an image generated by Image 2.0 for a miniature, contrasting it with Nano Banana 2's output, which required more iterations and still exhibited AI characteristics. Image 2.0's ability to generate a realistic photo with a specific cinematic effect and ancient texture is lauded.
The final test focuses on the accuracy of text generation, specifically an "iPhone panorama" prompt. The presenter notes that text generation has been a challenge for AI image models, often resulting in gibberish. They show an example of a screenshot released by OpenAI for Image 2.0, which contains clear and legible text, indicating a significant improvement. The presenter specifically points to the text on a "Game Boy Advance SP" as being exceptionally well-rendered.
A last test on manuscript writing is performed. The prompt asks for a photorealistic image, taken with an iPhone, of a handwritten essay in pencil on lined paper about the history of baseball in Toronto. The handwriting should be natural, varied, and slightly messy, with a coffee stain in the upper right corner. The presenter notes the speed difference in generation, with Nano Banana 2 taking 52 seconds for its output. The results are compared, and the presenter emphasizes the difference between an image that clearly looks like AI and one that appears genuinely photographic. Image 2.0's output is described as having a "smoothing" effect and superior contrast, making it exceptionally realistic. The presenter highlights that Image 2.0's result is indistinguishable from a real photo, unlike Nano Banana 2's.
In conclusion, the presenter reiterates that Image 2.0 is significantly faster and produces higher quality results, particularly in realism and detail. They emphasize that Image 2.0 generates "images," not just "AI images," implying a level of realism that transcends typical AI output. The video concludes by encouraging viewers to explore OpenAI's page for further tests and information on pricing, and to subscribe to the newsletter for more AI updates.