Introduction to Generative Adversarial Networks (GANs)
In recent years, generational adversarial networks (GAN) have received considerable attention as a result of the pioneering work by Goodfellow et al. published in 2014. The attention that has been given to GANs has resulted in an explosion of new ideas, techniques, and applications that are being developed. For a better understanding of GANs, we need to understand the mathematical foundation of them.
Generative Adversarial Networks (GANs) are a type of generative models, which observe many sample distributions and generate more samples of the same distribution. There are two major components of GAN architecture, the generator and discriminator. The role of the generator is to generate information (Example Images) from the data set that its fed information from and discriminator’s role is to differentiate fake from real images.
The GAN architecture
In a basic GAN architecture there are two networks that exist: the generator network and the discriminator network. A GAN gets its name because it consists of two networks that are trained simultaneously and competing with each other, just as if they were playing in a zero-sum game such as chess.
In theory, the generator model can create an image from scratch. Basically, the purpose of the generator is to produce images that appear so real that the discriminator is fooled. As a rule of thumb, the input for the simplest GAN architecture for image synthesis is typically random noise, and its output is a generated image.
The discriminator is just a binary image classifier which you should be familiar with from working with binary images. The job of the discriminator is to determine whether an image is real or fake.
Let’s put it all together and see what a basic GAN architecture looks like: the generator makes fake images; the real images (training dataset) and the fake images are fed into the discriminator individually in separate batches. The discriminator then analyzes the image and tells whether it is real or fake.
The Min-Max strategy: G vs. D
Deep learning algorithms (like image classification) are based on optimization: finding the lowest value of the cost function. The GANs are unique because each of the two networks, the generator and discriminator, has its own cost and has opposite objectives:
- The generator tries to trick the discriminator into believing that the fake images are real
- The discriminator tries to classify real and fake images accurately.
This adversarial dynamic during training can be illustrated using the Minimax Math Function.
The generator and discriminator both improve with time as the training proceeds. During the course of training, the generator gets better and better at generating images that resemble the training data, whereas the discriminator gets better and better at telling the real from the fake.
Training GANs is to find an equilibrium in the game when:
- Data from the generator looks very similar to the data from the training set.
- A discriminator can no longer distinguish between fake images and real ones.
Also Read: AI Art Generator
The generator vs. discriminator
GANs training is similar to that process. A generator might be viewed as an artist, and a discriminator might be viewed as a critic. The generator does not have any access or visibility at all to the masterpiece it is trying to copy. However, it relies solely on the discriminator’s feedback to improve the images that it generates.
In order to build a good GAN model, there are two key factors: good quality – images should not be blurry and should resemble the training image; and diversity – the images should be generated in such a way that approximates the distribution of the training dataset.
To evaluate the GAN model, you can visually inspect the generated images during training or by inference with the generator model.
There are two popular evaluation metrics for GANs:
- Inception Score, which tries to capture the quality as well as the diversity of the generated images.
- Frechet Inception Distance that compares real vs. fake images and does not just evaluate the generated images in isolation.
The original GANs paper by Goodfellow et al. in 2014 gave rise to many variants of GANs. A GANs architecture tends to build upon another, either to solve a particular training problem or to create a better image or a finer control over the GANs.
Listed here are a few of these variants presenting breakthroughs that provided the foundations for future GAN advances. The following list does not purport to be a complete list of all the GAN variants.
The Wasserstein GAN (WGAN) and the Wasserstein GAN-GP were designed to solve problems related to GAN training, such as mode collapse, where the generator repeats the same images or a small subset of them repeatedly. For training stability, WGAN-GP uses gradient penalty instead of weight clipping.
Pix2PixHD (High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs) disentangles the effects of multiple inputs on the resultant image, as demonstrated in the paper example: control colour, texture, and shape of the generated image for garment design. Aside from this, it can also generate high-resolution 2K images that are realistic.
DCGAN (Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks) was the first GAN proposal using Convolutional Neural Network (CNN) in its network architecture. Most of the GAN variations today are somewhat based on DCGAN. Thus, DCGAN is most likely your first GAN tutorial, the “Hello-World” of learning GANs.
cGAN (Conditional Generative Adversarial Nets) first introduced the concept of generating images based on a condition, which could be an image class label, image, or text, as in more complex GANs. Pix2Pix and CycleGAN are both conditional GANs, using images as conditions for image-to-image translation.
SAGAN (Self-Attention Generative Adversarial Networks) improves image synthesis quality: generating details using cues from all feature locations by applying the self-attention module (a concept from the NLP models) to CNNs. Google DeepMind scaled up SAGAN to make BigGAN.
BigGAN (Large Scale GAN Training for High Fidelity Natural Image Synthesis) can create high-resolution and high-fidelity images.
ProGAN (Progressive Growing of GANs for Improved Quality, Stability, and Variation) grows the network progressively.
StyleGAN (A Style-Based Generator Architecture for Generative Adversarial Networks), introduced by NVIDIA Research, uses the progress growing ProGAN plus image style transfer with adaptive instance normalization (AdaIN) and was able to have control over the style of generated images.
StyleGAN2 (Analyzing and Improving the Image Quality of StyleGAN) improves upon the original StyleGAN by making several improvements in areas such as normalization, progressively growing and regularization techniques, etc.
ProGAN, StyleGAN, and StyleGAN2, are all capable of creating high resolution images.
Application of GAN’s
The creation of images through image synthesis can be fun and can also have practical applications, such as providing image augmentation for machine learning (ML) training or to create artwork or design resources.
An artificial neural network (GAN) can be used to generate images that have, so far, never existed, and this is perhaps what GANs are best known for. They can generate unseen images and artwork of cats, new faces, and many more.
Also Read: Artificial Intelligence and Architecture
In computer vision, image-to-image translation refers to the process of translating an input image into another domain (e.g., color or style) while maintaining the original image content. In terms of the use of GANs in art and design, this may be one of the most important tasks.
Pix2Pix (Image-to-Image Translation with Conditional Adversarial Networks) is a conditional GAN that was perhaps the most famous image-to-image translation GAN. One drawback of Pix2Pix is that it requires paired training image datasets.
CycleGAN uses Pix2Pix and only needs unpaired images, which are easier to obtain in the real world. This program can change an image of an apple into an orange, or a sunset into a sunrise, or a horse into a zebra, for example. I agree that these might not be real-world use cases at all, but there have been so many other image-to-image GANs developed since then for art and design, that they may not be suitable for this purpose.
Source: CycleGAN converts a horse to a zebra (image source: CycleGAN Project Page).
It is now possible to translate your selfie into comics, paintings, cartoons, or any other style you can imagine. White-box CartoonGAN can be used to turn our pictures into a cartoonized version.
Colorization can be applied to both black and white and color photos, as well as artworks and design assets. Generally, when we are making artwork or designing UI/UX, we start by drawing outlines or contours and then coloring them after that. The automated colorization could help provide artists and designers with inspiration.
There have been a lot of examples of GANs translating images to images so far. There is also the possibility of using words as the condition to generate images, which is much more flexible and intuitive than using class labels as the condition.
Here are some examples: StyleCLIP and Taming Transformers for High-Resolution Image Synthesis.
GANs can be used for not only images but also music and video. For example, GANSynth from the Magenta project can make music.
Here is an example of how GANs can be used for climate change. Earth Intelligent Engine, an FDL (Frontier Development Lab) 2020 project, uses Pix2PixHD to simulate what an area would look like after flooding.
Also Read: Tools to Make AI Generated Art
Other GAN applications
Here are a few other GAN applications:
- Image inpainting: replace the missing portion of the image.
- Image uncropping or extension: this could be useful in simulating camera parameters in virtual reality.
- Super-resolution (SRGAN & ESRGAN): enhance an image from lower-resolution to high resolution. This could be very helpful in photo editing or medical image enhancements.
It has been demonstrated in papers as well as research laboratories. As well as in open source projects. In the last few years, we are starting to see real-world commercial applications using GANs. It is common for designers to use icons8 assets when designing. As GANs advance there may be more use for them in the real world, to fight climate change, to analyze historical images that are lost… etc.