Fabian Stelzer recently made an image comparison test between three artificially intelligent (AI) text-to-image generators: DALL-E, Midjourney, and Stable Diffusion.
Stelzer published his findings in a Twitter thread where he explained the process that involved giving exactly the same prompt to each machine and using a 1:1 aspect ratio.
Stelzer tells PetaPixel that he sees each program, which he dubs image synths, as “instruments”, with each generator producing its own style, tone, and mood.
“I look at these image synths as instruments, each with their own timbre, strengths, and weaknesses,” he explains.
“Midjourney reminds me of a beautiful analog Moog synthesizer — it’s almost impossible to make it sound bad and you can do incredible things with it, but in exchange, its range is more limited. The artifacts it does are like analog distortion, very pleasing.”
Stelzer says that perhaps the best-known AI image generator, DALL-E 2, is like “a digital workstation synth — incredible range, but it almost always sounds a bit too digital.”
“Stable Diffusion is like a complex modular synthesizer, you can get almost any tone out of it, but it’s a bit harder to play and prompt.”
The fascinating experiment gives an insight into how each AI image generator interprets instructions and leaves clues to the machine’s overall image style.
Midjourney has a consistently darker feel than the other two. Take the “Behind the scenes of the moon landing” image. While DALL-E 2 and Stable Diffusion generate a far more realistic image. Midjourney’s offering has an apocalyptic feel to it, with the astronaut looking like they have just fallen out of a horror film.
However, Midjourney does not seem to be able to create a photorealistic image, Stable Diffusion appears best at that from Stelzer’s trial.
“AI image synths are going to revolutionize creative work in ways we haven’t seen since the advent of photography — what photography was to painting, image synths are to photographs, and what film was to theater, image synths are to film,” says Berlin-based Stelzer.
“This isn’t just about being able to summon any image on the fly, but about what these tools will enable — in a few years anyone will be able to create film-like content by merely typing it out in rich literal detail.”
DALL-E 2 vs Midjourney vs StableDiffusion mega thread: photography, illustration, painters, abstract
these image synths are like instruments – it’s amazing we’ll get so many of them, each with a unique “sound” 🤯
rules: same prompt, 1:1 aspect ratio, no living artists pic.twitter.com/47syy7uPJJ
— fabians.eth (@fabianstelzer) August 20, 2022
Stelzer says what was once difficult, will become easy. It is true that AI is making tasks, that once took hours of practice to master, effortless. Such as software that can repair old photographs.
More of Stelzer’s work can be found on his Twitter.
Image credits: All images courtesy of Fabian Stelzer.