Artificial Intelligence is capable of incredible feats, and image generation is one of them. Today, we are comparing two leading models for creating images from text: Flux.1 pro and DALL-E 3. These models promise to bring our imagination into visual reality, but which one does it better?
Let's get ready to rumble. We'll put these modles through tough tests, and see who triumphs in the end.
Top view of the hands of a pianist playing the piano.
Fingers reaching for the keys, performing a complex passage.
Results:
In this case - fingers from DALL-E 3 look a little uncanny - and that's because there are quite a few anatomical misses the model has made. This point goes to FLUX.1
If such pressure-testing is not for you and you want to see the aesthetic site of FLUX.1 - check out 10 cool prompts for the model.
Create an image of a futuristic library in the year 2050.
The slogan “Knowledge is Power” is written on the main wall
in large neon letters.
Next to it stands a robot librarian.
In the background, holographic screens
with the text Future Now float in the air.
Results:
This prompt shows that creating text on an image is again better accomplished with FLUX.1, succeeding even for the background sign.
Create a hyper-realistic close-up of a human eye.
The iris should contain a detailed miniature landscape
with tiny trees and winding river.
Include visible individual eyelashes, fine blood vessels
in the sclera, and a reflection of a cityscape in the pupil.
This prompt should show us, how good the model is at following the prompt precisely.
Results:
Hard to call those blood vessels in Dall-e 3 fine, so the point certainly goes to FLUX.1
Create an image of an artist's desk.
Four brushes of different sizes lie on the desk.
An open sketchbook shows an unfinished drawing of a flower.
A cup of unfinished coffee stands next to a paint palette.
Sunlight falls on the desk from a window on the left.
Results:
Let's give both models a point, although Dall-e has clearly overdone it with the number of brushes.
Create an image of four books on a shelf, with their spines
forming the word ROAD.
Each letter of the word ROAD is on the spine of a separate book.
The books should be of different thicknesses and shades of blue.
A small compass statuette stands next to the books on the shelf.
Results:
To my surprise, even the best of 10 attempts at DALL-E 3 doesn't match the prompt until the end, providing books of equal width, whilst struggling with fonts till the very end. So FLUX.1 wins this task.
The Pricing model is given in AI/ML API tokens. We will provide pricing for FLUX.1 pro, although some of the generations where taken from the cheaper FLUX.1 dev.
You've seen what these models can do - now try them for your use case. Plug the code below into Google Colab or any IDE, use your API Key, and get testing!
%pip install openai
import os
from openai import OpenAI
import requests
url = "https://api.aimlapi.com/images/generations/"
model1="flux-pro"
model2="stable-diffusion-v3-medium"
prompt="""
Create an image of four books on a shelf.
"""
payload1 = {
"prompt": prompt,
"model": model1,
}
payload2 = {
"prompt": prompt,
"model": model2,
}
headers = {
"Authorization": "Bearer <YOUR_API_KEY>",
"content-type": "application/json"
}
print("\n RESPONSES BELOW")
response1 = requests.post(url, json=payload1, headers=headers)
response2 = requests.post(url, json=payload2, headers=headers)
print(f"{model1}: {response.json()}")
print("\n")
print(f"{model2}: {response2.json()}")
print("\n")
The results of this comparison are quite interesting and revealing. Flux.1 [Pro] demonstrates impressive capabilities in image generation, especially in terms of detail accuracy and realism. Interestingly, DALL-E 3, despite its fame, showed less impressive results in most tests. This may indicate that popularity doesn't always equal superior performance.
You can check our model lineup here - try any of them for yourself with our API Key.