DALL·E 2: the AI that create images from descriptions in natural language

Pedro Alvarado
3 min readApr 8, 2022

--

An astronaut lounging in a tropical resort in space in a vaporwave style
“An astronaut lounging in a tropical resort in space in a vaporwave style” — from OpenAI

OpenAI’s new AI system is strikingly, since it can’t only create realistic images and art, but also it can make edits to an existent image and create new variations inspired by an original image, all this from a description in natural language.

DALL·E 2 is the DALL·E 1 successor. In January 2021, OpenAI announced DALLE·E 1, and a year later introduced DALL·E 2. Unlike its predecessor, DALL·E 2 generates more realistic and accurate images with 4x greater resolution.

Comparation of “a painting of a fox sitting in a field at sunrise in the style of Claude Monet”
Imagen from OpenAI

How does it works?

This AI model is based on a neural network. The neural network was trained with pairs of images and their corresponding descriptions, that is, the images were accurately labeled. This is important, because if you labeled an image incorrectly, and train the model with that data, the model may output an image which has nothing to do with the description. This is a limitation due to it is based on supervised learning.

Thanks to deep learning, the model doesn’t only understand individual objects but also the relation between them. Due to this the AI can take what it learned from other images and apply all that knowledge in a new image.

For example, you may saw a teddy bear (maybe played with one). You may have the idea of a mad scientist (maybe you saw one on television). Now, if I tell you to imagine “Teddy bears mixing sparkling chemicals as mad scientists” you could do it, because you have the knowledge of two separate ideas and mix them in a picture imagined in your mind. DALL·E 2 can also make this.

“Teddy bears mixing sparkling chemicals as mad scientists in a steampunk style” — from OpenAI

To avoid improper use of this technology, DALL-E 2 cannot generate violent or explicit images, nor can it generate portraits that can be assimilated to real people, which is why it tends to create more generic images.

Possible uses and some personal thoughts

I have been thinking about the possible uses to this AI. For example one use could be that instead of searching images that were created or taken by people, we could use images generate by this AI. The advantage is that the images will be more specific.

Also this AI could potentially be use to create NFTs and digital art in general (I’m not going to expand on this idea because I don’t know much about NFTs and digital art, but it is a possibility).

It was though that AI would first replace the jobs and tasks that could be automated. But surprise! Here is this AI performing tasks that we (humans) consider highly creative. Who would have thought that the work of the illustrator (creative work) would be among the first to potentially be automated?

Final thoughts

This new AI make me think that a wonderful future awaits AI, and that an uncertain future awaits humans. This is not an inherently bad thing. This just shows that one of the most important skills at the XXI century is to “never stop learning”.

Thank you for reading me. See you next!

The information of this article was taken from the OpenAI site.

--

--