DALL E sucks
A real comparison between Dalle and other Image Generation Algorithms like Dalle e mini, VQGAN, Latent Diffusion and Midjourney)
DALL·E 2 is a new AI system that can create realistic images and art from natural language descriptions. It can combine concepts, attributes, and styles. In January 2021, OpenAI introduced DALL·E. One year later, their newest system, DALL·E 2, generates more realistic and accurate images with 4x greater resolution. We have seen amazing, really impressive images generated by DALLE 2.
When OpenAI accepted my request to access DALL E 2 I was super excited and eager to try the image generation capabilities. I have a long history of using Image Generation technologies like GANs (DC GANs, SR GANs) or Dall-E alternatives (Craiyon former Dalle mini, Dalle flow, VQGAN, NeuroGen). But when I saw the results for Dalle 2 I was really amazed!
There are a lot of discussions about OpenAIs practices. Some people argue how this company can be named “Open” AI when all its systems and algorithms are closed. I quite agree with those voices that expect more transparency in the field. However, I have a feeling that it's harder than we think. Image generation algorithms typically need many training images and huge processing power to train. Those requirements make it hard for the simple researcher to recreate or adjust existing algorithms. Furthermore, we have to think that we have imposed a set of limitations to the generative power of those algorithms e.g. generating nudes or perverted or distorted images. Lastly, a lot of people think that image generation algorithms should also take care of stereotypes we have about sexes and races. We can see that the topic is more complex than most people think but it's a good discussion subject that we should keep doing without restricting further research.
To make a good comparison between Dalle Alternatives and Dalle, I will be using a nice Youtube Video created for Tomorrowland festival’s Love Tomorrow Conference in July 2022. What a wonderful video!
I have broken the video image by image and used DALL E with the same text prompt so that we can compare outputs with exactly the same inputs.
1. painting, bowl of soup floating in space surrounded by galaxies and stars, John Atkinson Grimshaw
Here we ask our AI to generate a painting with a style similar to John Atkinson Grimshaw. John Atkinson Grimshaw (6 September 1836–13 October 1893) was an English Victorian-era artist best known for his nocturnal scenes of urban landscapes. He was called a “remarkable and imaginative painter” by the critic and historian Christopher Wood in Victorian Painting (1999). Some examples of his paintings can be found below:
Let’s see what dalle mini generates:
Comparison Result: Equal
2. galaxies and stars in space, John Atkinson Grimshaw
Again we ask about an image inspired by John Atkinson’s work
Comparison Result: DALL E2
3. oil painting, atoms dancing in a bowl of soup in space
This is an interesting experiment since we do not specify a specific style. Again we compare DALL E mini with DALLE 2.
Comparison Result: DALL E (I was inclined to give a tie here but I think that DALLE 2 is showing the dancing atoms clearer)
4. painting, then came the trees, John Atkinson Grimshaw
This input again uses Atkinsons style, however, the input is really abstract.
Comparison Result: DALL E mini. Another hard decision here since I think DALL E2 results have more details and better resolution but the style of the mini matches really well with the whole mood of the text and video.
5. picture, prehistoric trees on fire, Henri Rousseau
Let’s change painter and ask some Henri Rousseau style
Henri Julien Félix Rousseau (21 May 1844–2 September 1910) was a French post-impressionist painter in the Naïve or Primitive manner. He was also known as Le Douanier (the customs officer), a humorous description of his occupation as a toll and tax collector. He started painting seriously in his early forties; by age 49, he retired from his job to work on his art full-time.
Ridiculed during his lifetime by critics, he came to be recognized as a self-taught genius whose works are of high artistic quality. Rousseau’s work exerted an extensive influence on several generations of avant-garde artists
In this case, we will compare VQGAN with DALLE2. VQGAN is a generative adversarial neural network that is good at generating images that look similar to others.
Comparison Result: Equal. If I have to choose I would probably select VQGAN though.
6. painting, clever apes making fire, Alexej von Jawlensky
This time we will use a new painter named Alexej von Jawlensky. Alexej Georgewitsch von Jawlensky (13 March 1864–15 March 1941), surname also spelt as Yavlensky, was a Russian expressionist painter active in Germany. He was a key member of the New Munich Artist’s Association (Neue Künstlervereinigung München), Der Blaue Reiter (The Blue Rider) group and later the Die Blaue Vier (The Blue Four).
Comparison Result: Dall E2.
7. painting, clever apes making fire, Henri Rousseau
Interestingly let’s see the same input from a different painting style perspective.
Comparison Result: Dall E2.
8. painting, clever apes making fire, John Atkinson Grimshaw
And lastly the same input from Atkinson
Comparison Result: Dall E2. (I have to note how important is to note the painting style each time. The difference between the last three images is big and distinctive)
9. painting, sunlight kissing leaves gave birth to trees, Jean-Francois Millet
Jean-François Millet (4 October 1814–20 January 1875) was a French artist and one of the founders of the Barbizon school in rural France. Millet is noted for his paintings of peasant farmers and can be categorized as part of the Realism art movement. Toward the end of his career he became increasingly interested in painting pure landscapes. He is known best for his oil paintings but is also noted for his pastels, conte crayon drawings, and etchings. Examples of his paintings are shown below:
In this input, we will compare Latent Diffusion Generative Model with DALL E 2
Comparison Result: Dall E2. Blown up!
10. painting, sunlight kissing leaves gave birth to trees
Let’s try the same input as above without mentioning any specific painting style. Here we will compare PixRay with DALL E2
Comparison Result: PixRay. Dall E2 results are ok but I prefer the pixray results where kissing action is more obvious.
11. painting, trees kissing firelight gave birth to fields of ash
Comparison Result: PixRay. Dall E2 results are ok but I prefer the pixray results where ash is more obvious.
12. painting, thousand-year old forests were felled for heat and light
Comparison Result: PixRay. Dall E2 results are ok but I prefer the pixray results where fallen trees are obvious.
13. painting, across the known world, band of humans sit by fires and rejoiced
Comparison Result: PixRay.
14. painting, humans celebrating another day of survival
Comparison Result: PixRay.
15. painting, we cut down the trees for fuel, Arnold Bocklin
Arnold Böcklin (16 October 1827–16 January 1901) was a Swiss symbolist painter. Influenced by Romanticism, Böcklin’s symbolist use of imagery derived from mythology and legend often overlapped with the aesthetic of the Pre-Raphaelites. Many of his paintings are imaginative interpretations of the classical world, or portray mythological subjects in settings involving classical architecture, often allegorically exploring death and mortality in the context of a strange, fantasy world.
Böcklin is best known for his five versions (painted 1880 to 1886) of the Isle of the Dead, which partly evokes the English Cemetery, Florence, which was close to his studio and where his baby daughter Maria had been buried. An early version of the painting was commissioned by a Madame Berna, a widow who wanted a painting with a dreamlike atmosphere.
Comparison Result: VQGAN seems to implement the fuel part better
16. painting, a road made of dead trees in the morning light, Richard Dadd
Richard Dadd (1 August 1817–7 January 1886) was an English painter of the Victorian era, noted for his depictions of fairies and other supernatural subjects, Orientalist scenes, and enigmatic genre scenes, rendered with obsessively minuscule detail. Most of the works for which he is best known were created while he was a patient in Bethlem and Broadmoor hospitals.
Comparison Result: VQGAN. DALL E2 doesn’t like ‘dead’ word even if we talk about ‘dead trees’.
17. painting, caveman lit a fire under organic soup in the forest. things began to bubble, Henri Rousseau
Comparison Result: DALL E2
18. painting, a city made of dead trees, Dorothea Tanning
Dorothea Margaret Tanning (25 August 1910–31 January 2012) was an American painter, printmaker, sculptor, writer, and poet. Her early work was influenced by Surrealism.
Comparison Result: Latent Diffusion (No Dalle available due to dead word)
19. painting, a city made of dead trees, Arnold Bocklin
Comparison Result: VQGAN(No Dalle available due to dead word)
20. painting, they gave their bodies to build our cities, Wayne Barlowe
Wayne Barlowe is a world-renowned science fiction and fantasy author and artist who has created images for books, film and galleries and written novels, screenplays and a number of art books. After attending Cooper Union he started his career painting hundreds of paperback covers for all of the major publishers and magazine illustrations for LIFE, TIME and NEWSWEEK.
Comparison Result: Latent Diffusion
21. painting, we wrote our histories, philosophies and shopping lists on their pulpy flesh, Wayne Barlowe
Comparison Result: Latent Diffusion (No Dalle available due to flesh word)
22. painting, but this was not sufficient, Wayne Barlowe
Comparison Result: Latent Diffusion
23. painting, we cut down the trees for fuel, Wayne Barlowe
Comparison Result: DALL E2 (mainly because of crispier images)
24. painting, trees crushed and compressed for millions of years in the pressure cooker of the earth, Tom Thomson
Thomas John Thomson (August 5, 1877 — July 8, 1917) was a Canadian artist active in the early 20th century. During his short career, he produced roughly 400 oil sketches on small wood panels and approximately 50 larger works on canvas. His works consist almost entirely of landscapes, depicting trees, skies, lakes, and rivers. He used broad brush strokes and a liberal application of paint to capture the beauty and colour of the Ontario landscape. Thomson’s accidental death by drowning at 39 shortly before the founding of the Group of Seven is seen as a tragedy for Canadian art.
Comparison Result: DALL E2
25. painting, extinct trees transformed to coal, Dorothea Tanning, dark and smoky
Comparison Result: VQGAN
26. painting, smoky clouds over crowded tenements, Hubert Robert
Hubert Robert (22 May 1733–15 April 1808) was a French painter, noted for his landscape paintings and capriccio, or semi-fictitious picturesque depictions of ruins in Italy and of France.
Comparison Result: VQGAN (Where is the crowd in dall-e?)
27. painting, pickaxes scarred the earth, Zdzislaw Beksinski
Zdzisław Beksiński (24 February 1929–21 February 2005) was a Polish painter, photographer, and sculptor, specializing in the field of dystopian surrealism.
Beksiński made his paintings and drawings in what he called either a Baroque or a Gothic manner. His creations were made mainly in two periods. The first period of work is generally considered to contain expressionistic color, with a strong style of “utopian realism” and surreal architecture, like a doomsday scenario. The second period contained more abstract style, with the main features of formalism.
Beksiński was stabbed to death at his Warsaw apartment in February 2005, by a 19-year-old acquaintance from Wołomin, reportedly because he refused to lend him money
Comparison Result: Latent Diffusion
28. painting, black dust scarred the lungs, Zdzislaw Beksinski
Comparison Result: Equal (but different)
29. painting, forges spat out locomotives, Zdzislaw Beksinski
Comparison Result: Latent Diffusion
30. painting, railways demolished distance, Dorothea Tanning
Here we use midjourney algorithms. Midjourney is an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species.
Comparison Result: Midjourney
I think we got a pretty good idea of the differences between the different Image generatives models. Let’s summarize the winners:
DALL-E 2: 12
Latent Diffusion: 7
VQGAN: 6
PIXRAY: 5
DALL-E Mini: 2
Midjourney: 1
It is obvious that DALL E2 is the best of all the other techniques. However it seems that case by case we can take really good results from other methods also. Only 12 out of 30 cases are much better results than the other techniques.
Do you agree with the above?
How would you compare the different generated images?
Would you like me to add here also the rest of the images? Currently, I have processed only 2.5 mins of 10 mins which means that we can have plenty of images to compare.
Thanks and cheers!