AI based illustrator draws pictures to go with text captions.

AI based illustrator

Introduction – AI based illustrator draws pictures to go with text captions.

AI based illustrator draws pictures to go with text captions, this is a great example of the capabilities AI has and how it can interpret sentences based on the contextual words used to draw illustrations. It is Aptly called AI based Illustrator.

Pushing The Boundary Through AI

Artificial Intelligence (AI) continues to push the boundaries of what is possible in the realm of creative endeavors. One fascinating application is the development of AI-based illustrators that have the ability to generate pictures that perfectly match text captions. This remarkable technology opens up new possibilities for visual storytelling, content creation, and artistic expression.

Traditionally, illustrations have been hand-drawn by artists, requiring time, skill, and creativity. However, with AI-based illustrators, the process becomes automated, enabling the generation of illustrations in a matter of seconds. By analyzing the content and context of text captions, these AI systems employ sophisticated algorithms to produce visually stunning and contextually relevant images that seamlessly accompany the written text.

The implications of this technology are vast and exciting. It has the potential to revolutionize various industries, from publishing and advertising to social media and entertainment. With AI-based illustrators, authors can bring their written works to life with captivating visuals, marketers can enhance their campaigns with engaging graphics, and social media users can effortlessly create eye-catching posts.

In this blogpost, we will explore the world of AI-based illustrators and delve into the underlying technologies and techniques that make this possible. We will examine the benefits and potential applications of AI-generated illustrations, as well as discuss the ethical considerations surrounding their use. Join us as we embark on a journey into the innovative realm of AI-based illustration and discover how this technology is reshaping the way we tell stories and communicate visually.

AI images generated from the text prompts “a baby daikon radish in a tutu walking a dog” and “an armchair in the shape of an avocado”

A neural network uses text captions to create outlandish images – such as armchairs in the shape of avocados – demonstrating it understands how language shapes visual culture.

OpenAI, an artificial intelligence company that recently partnered with Microsoft, developed the neural network, which it calls DALL-E. It is a version of the company’s GPT-4 language model that can create expansive written works based on short text prompts, but DALL-E – 2 produces images instead.

“The world isn’t just text,” says Ilya Sutskever, co-founder of OpenAI. “Humans don’t just talk: we also see. A lot of important context comes from looking.”

DALL-E is trained using a set of images already associated with text prompts, and then uses what it learns to try to build an appropriate image when given a new text prompt.

It does this by trying to understand the text prompt, then producing an appropriate image. It builds the image element-by-element based on what has been understood from the text. If it has been presented with parts of a pre-existing image alongside the text, it also considers the visual elements in that image.

“We can give the model a prompt, like ‘a pentagonal green clock’, and given the preceding [elements], the model is trying to predict the next one,” says Aditya Ramesh of OpenAI.

For instance, if given an image of the head of a T. rex, and the text prompt “a T. rex wearing a tuxedo”, DALL-E can draw the body of the T. rex underneath the head and add appropriate clothing.

The neural network, which is described today on the OpenAI website, can trip up on poorly worded prompts and struggles to position objects relative to each other – or to count.

“The more concepts that a system is able to sensibly blend together, the more likely the AI system both understands the semantics of the request and can demonstrate that understanding creatively,” says Mark Riedl at the Georgia Institute of Technology in the US.

“I’m not really sure how to define what creativity is,” says Ramesh, who admits he was impressed with the range of images DALL-E produced.

The model produces 512 images for each prompt, which are then filtered using a separate computer model developed by OpenAI, called CLIP, into what CLIP believes are the 32 “best” results.

CLIP is trained on 400 million images available online. “We find image-text pairs across the internet and train a system to predict which pieces of text will be paired with which images,” says Alec Radford of OpenAI, who developed CLIP.

“This is really impressive work,” says Serge Belongie at Cornell University, New York. He says further work is required to look at the ethical implications of such a model, such as the risk of creating completely faked images, for example ones involving real people.

Effie Le Moignan at Newcastle University, UK, also calls the work impressive. “But the thing with natural language is although it’s clever, it’s very cultural and context-appropriate,” she says.

For instance, Le Moignan wonders whether DALL-E, confronted by a request to produce an image of Admiral Nelson wearing gold lamé pants, would put the military hero in leggings or underpants – potential evidence of the gap between British and American English.

Characteristics Of AI based Illustrators

Controlling attributes

Controlling attributes in AI-driven text-based illustration refers to the capacity of an AI model to manage specific details, characteristics, or variables of an image based on text inputs. This is essential in the generation of intricate and relevant images corresponding to the descriptions provided. By manipulating these attributes, the AI can create illustrations that closely match the described scenario, object, or scene. This capacity for fine-grained control of illustrations significantly enhances the applicability and efficiency of AI in domains such as art, design, entertainment, and education.

The development of such a system depends on an advanced understanding of both natural language processing and computer vision, along with the ability to establish intricate connections between these two domains. AI models must be trained using large datasets, comprising pairs of descriptions and corresponding images, to comprehend and execute the representation of text into relevant visual features. The AI is thereby trained to understand not only simple attributes like color, size, or shape but also more complex ones like perspective, lighting, and texture. This complex network of learned relationships allows for the AI to generate highly detailed and accurate illustrations based on the attributes mentioned in the text input.

Drawing multiple objects

Drawing multiple objects with AI-driven text-based illustration is a sophisticated feat that requires intricate understanding and interpretation of textual descriptions. This capability extends the potential uses of AI models in creating complex scenes, storyboarding, and even graphic design. The ability to accurately portray multiple entities based on textual input allows these systems to generate detailed visual narratives, where relationships between different elements can be understood and represented. For instance, AI can be instructed to draw “a cat sitting on a red mat in front of a blue house,” and it would generate an illustration that encapsulates all these elements in the correct configuration.

The challenge lies in the AI’s ability to not only understand the individual objects mentioned but also their relative positioning, sizes, and interactions. Training AI for such a task necessitates large and diverse datasets, featuring multiple objects per image along with their corresponding descriptive texts. In addition, the model needs to effectively handle overlapping objects, occlusion, and perspective. Sophisticated AI models such as these can potentially revolutionize several sectors, from entertainment and education to marketing and advertising, by creating custom, detailed illustrations based on simple text prompts.

Visualizing perspective and three-dimensionality

Visualizing perspective and three-dimensionality with AI-driven text-based illustration is an exciting advancement that brings a depth of realism to AI-generated artwork. By understanding and implementing concepts such as vanishing points, horizon lines, and foreshortening, AI can create drawings that convincingly represent three-dimensional space on a two-dimensional plane. This involves interpreting text-based instructions for not just what objects to draw, but also where and how they should be placed in relation to each other and the viewer. For example, a command like “a large tree in the foreground with a small house in the distance” should result in an illustration that accurately represents these size and distance relationships. This kind of nuanced interpretation is a testament to the evolving complexity of AI models, providing an increasingly sophisticated tool for visual expression in fields ranging from entertainment and education, to design and marketing.

Visualizing internal and external structure

Visualizing internal and external structure with AI-driven text-based illustration involves generating detailed and accurate representations of both the outside appearance and the inside make-up of objects. This complex task requires the AI to interpret and visualize text descriptions that may involve intricate details, cross-sections, or cut-away views. For instance, if given a prompt like “an apple cut in half, exposing its core and seeds,” the AI would need to illustrate the exterior of the apple and its internal structure with equal accuracy. This capacity for detailed rendering opens up significant potential in educational fields, such as biology and engineering, where complex structures often need to be visualized for better understanding. It also has potential in fields like architecture and product design, where an accurate visual representation of both the exterior design and the interior structure can be invaluable.

Inferring contextual details

Inferring contextual details with AI-driven text-based illustration involves the AI’s ability to understand and visualize not only the explicit details provided in the text, but also the implicit information that shapes the overall scene. This ability involves recognizing and interpreting cues from the given descriptions to generate a coherent and contextually accurate illustration. For instance, a text input like “a child playing in the snow” implies that the setting is likely to be outdoors, possibly during winter, and the child might be dressed in warm clothing.

The AI would need to infer these contextual details to create an illustration that accurately captures the described scene. This level of interpretation greatly enhances the value of AI-driven illustration, making it capable of generating sophisticated and nuanced visual narratives that extend beyond the explicit details provided in the text prompts.

Applications of preceding capabilities

The aforementioned capabilities of AI-driven text-based illustration can be applied across various fields to enhance creativity, simplify tasks, and make complex concepts more accessible. In education, such technology can create illustrative content on demand, helping students visualize complex scientific or mathematical concepts, or bringing historical events to life. In the entertainment industry, it can aid in storyboarding or character design, where multiple objects, perspectives, and contextual details can be brought together to visually depict a script or a plot.

Similarly, in graphic design and advertising, AI can generate bespoke illustrations based on specific requirements, potentially streamlining the process of visual content creation. In fields like architecture and product design, visualizing internal and external structures can help in the design and prototyping phase. Thus, the application of these capabilities can revolutionize how we engage with and create visual content, making it more interactive, dynamic, and tailored to our specific needs.

Combining unrelated concepts

Combining unrelated concepts with AI-driven text-based illustration allows for the creation of novel and imaginative scenes that defy conventional associations. This capability involves the AI’s understanding of disparate elements and their successful integration into a coherent image. For instance, a prompt like “a dolphin flying in the sky with birds” requires the AI to illustrate a scenario that doesn’t exist in reality but is plausible in a creative or fantastical context. This potential for visualizing imaginative scenarios enhances the use of AI in creative industries such as storytelling, art, and advertising, where surreal or metaphorical imagery can be a powerful tool for conveying unique ideas or emotions.

Applications that use AI that Illustrate 

Artificial intelligence has a broad range of applications that involve illustration, each serving different purposes and industries.

One key application is in graphic design and digital art, where AI can generate images or modify existing ones based on text descriptions or creative prompts. This allows artists and designers to quickly generate concept art, experiment with different styles, or produce large amounts of visual content quickly.

In the educational field, AI-driven illustration can be used to create visual aids and resources, such as diagrams, infographics, or interactive visual experiences. For example, an AI system could generate an accurate illustration of a cell’s structure based on a text description, assisting in biology education.

In the entertainment industry, particularly in video game development and animation, AI can aid in character design and environment creation. With the ability to illustrate complex scenarios and designs based on text descriptions, AI can streamline the creation process and allow for more dynamic and unique visual outcomes.

AI also finds application in architecture and engineering, where it can generate 3D models and detailed drawings of structures based on descriptions or blueprints. This not only facilitates the visualization of the final product but also aids in identifying potential design flaws or improvements.

How does AI Illustration work

AI-based illustration, often based on generative models, typically works through a combination of advanced techniques in natural language processing and computer vision. Here is a simplified explanation of the process:

Data Collection and Preparation

This initial step involves gathering a vast dataset of images paired with corresponding textual descriptions. The data is then cleaned and preprocessed to make it suitable for training the AI model.

Model Training

The prepared data is fed into a machine learning model, typically a type of deep learning model such as a Generative Adversarial Network (GAN) or a Transformer-based model. These models learn to understand the relationship between the textual descriptions and their corresponding images.

Feature Extraction

The AI learns to extract features from the textual descriptions, such as identifying objects, understanding their attributes, and their relative positions. Similarly, the AI learns to interpret these features in the visual domain from the image data.


After training, when the model is given a text prompt, it interprets the details and generates a corresponding image. It accomplishes this by translating the learned relationships from the textual domain into the visual domain, essentially ‘drawing’ the description.


The output can be further refined using techniques such as style transfer, where the AI applies a specific artistic style to the generated illustration.

Tools that use text based illustration

DeepArt and DeepDream

These tools utilize neural networks to transform images in unique and visually striking ways, often emulating the style of famous artists or creating surreal, dream-like modifications.


Developed by OpenAI, DALL-E-2 is a version of the GPT-4 model trained to generate images from textual descriptions. While not commercially available as a tool, the demonstrations of DALL-E’s capabilities show great promise for the future of AI-based illustration.

Runway ML

This platform provides a variety of machine learning tools for creators, including style transfer and image generation capabilities.


This platform uses Generative Adversarial Networks (GANs) to combine and mutate images, allowing users to create complex and unique illustrations.

Google’s AutoDraw

While not as advanced as some of the other tools, AutoDraw uses AI to guess what you’re trying to draw and offers you a refined version of it.


As AI-based illustration tools continue to evolve, the generation of original and realistic images from textual descriptions is becoming more refined. These technologies employ artificial intelligence to create an array of visual elements, culminating in AI-generated art that, in some instances, rivals creations by human artists. AI tools such as image generators are becoming indispensable in various fields, especially in the realm of social media, where the demand for unique and engaging visual content is high.

The incorporation of AI technologies, including neural style transfer and art generators, have given rise to an impressive array of image tools. These tools can manipulate image metadata to craft the perfect image for a given text description, and with some like DALL-E, even create complex, original scenes. Such AI-generated images are not confined to a static image frame but can manifest in various forms and sizes based on the user’s requirements.

While there are concerns surrounding the misuse of AI technology, such as in the creation of ‘deep fakes,’ the potential benefits of these tools are immense, particularly for graphic artists who can use them to enhance their work. The development of image generator apps has made these tools more accessible than ever, allowing a broader audience to engage with and benefit from artificial intelligence technologies in the realm of art and design. In conclusion, AI-driven illustration represents a significant stride forward in the intersection of technology and creativity, promising a future where AI aids human creativity, rather than replaces it.


OpenAI. Accessed 4 June 2023.

Gent, Edd. “Dr Dolittle Machines: How AI Is Helping Us Talk to the Animals.” New Scientist, 16 Dec. 2020, Accessed 4 June 2023.