Conditional Generative Adversarial Networks, or cGANs for short, improve regular or 'vanilla' GANs by adding a condition into the Generator and Discriminator networks. The idea is that it allows a GAN to better structure its latent space and the mapping into data space, and the concept of a cGAN was proposed by Mirza & Osindero (2014).
In this article, we're going to take a look at cGANs and explain the concepts. After reading it, you will...
Let's take a look! š
Generative Adversarial Networks were proposed back in 2014, through a paper written by Ian Goodfellow and others (Goodfellow et al., 2014). Their architecture is composed of two neural networks. First of all, there is a generator \(G\), which is responsible for generating images. Secondly, there is a discriminator \(D\), which has the task to detect which of the images presented to it is fake and which is real.
As they are trained jointly by minimizing loss components, the following minimax game emerges, as discussed in our article about vanilla GANs:
Loss and its components for an unconditioned GAN.
This loss works with both the vector \(\bf{z}\) sampled from the latent distribution \(\text{Z}\) and vector \(\bf{x}\) which is generated by the generator on the basis of \(\bf{z}\).
Through their joint training, the Generator learns to convert samples from the latent distribution in such a way that they produce output images that cannot be distinguished from real ones anymore. This allows us to draw samples (\(\bf{z}\)s) from the latent distribution and generate images. In effect, through the lens of the Generator, the latent distribution thus compresses information about 'data space' - but then in an accessible way.
According to Goodfellow et al. (2014), vanilla GANs have many straightforward extensions - of which cGANs are one:
A conditional generative model p(x | c) can be obtained by adding c as input to both G and D
Goodfellow et al. (2014)
Yes, vanilla GANs are unconditional. In the quote above, you can read \(p(\text{x | c})\) as the probability that we generate vector \(\bf{x}\) given some condition \(c\). Adding a condition to our probabilities allows us to teach the Generator to use the latent distribution in an even more structured way.
For this reason, Mirza & Osindero (2014) propose what they call the Conditional GAN, or cGAN. It adds conditioning on some extra information \(\bf{y}\). Then, the probability \(p(\bf{x})\) generated by \(G\) turns into \(p(\bf{x|y})\), or "the probability for \(\bf{x}\) given condition \(\bf{y}\)". The trained eye easily sees that \(\bf{y}\) and \(\bf{c}\) are the same; that they just use different letters.
Adding conditioning is expected to allow for better structuring of the latent space and its sampling and thus generate better results. Any kind of information can be used for conditioning (Mirza & Osindero, 2014). Simple conditioning can be achieved by adding label data, such as the target for the image to be generated (e.g. y = 1
if the goal is to generate 1s from the MNIST dataset). It is also possible to use more complex conditioning information, such as data from other modalities (e.g. text instead of images).
Applying conditioning involves feeding the condition parameter \(\bf{y}\) into both the generator \(G\) and the discriminator \(D\) of your GAN. We use an additional input layer for this purpose. Visually, this looks as follows:
Conditional Generative Adversarial Network (cGAN) architecture (Mirza & Osindero, 2014).
We can see that the conditioning information is first concatenated with the latent space or generation and then further processed. It's a straightforward expansion of the vanilla GAN. No other significant changes to e.g. hyperparameters were applied compared to the Goodfellow et al. (2014) GAN, at least according to the paper (Mirza & Osindero, 2014).
Conditional GANs were tested in two settings:
In both cases, very agreeable results were achieved, with cGANs achieving better performance on MNIST generation compared to vanilla GANs. Conditioning definitely helps here!
Vanilla GANs were proposed back in 2014 (by Goodfellow et al., 2014) as a new mechanism for generative Machine Learning. As with any innovation, non-significant amounts of optimization have taken place. This is even suggested in the original work: there are a variety of straightforward extensions that can be applied to possibly make GANs even more performant.
Conditional GANs, or cGANs, are one such extension. By making the sampling from latent space and data space conditional by adding an additional parameter to the neural networks, the neural network can much better structure the latent space and the mapping into data space. As a consequence, cGANs are more performant compared to the 2014 vanilla GANs, and adding conditionality - if possible - can be a best practice for training your GAN.
By reading this tutorial, you now...
I hope that it was useful for your learning process! Please feel free to share what you have learned in the comments section š¬ Iād love to hear from you. Please do the same if you have any questions or other remarks.
Thank you for reading MachineCurve today and happy engineering! š
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ā¦ & Bengio, Y. (2014).Ā Generative adversarial networks.Ā arXiv preprint arXiv:1406.2661.
Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets.Ā arXiv preprint arXiv:1411.1784.
Learn how large language models and other foundation models are working and how you can train open source ones yourself.
Keras is a high-level API for TensorFlow. It is one of the most popular deep learning frameworks.
Read about the fundamentals of machine learning, deep learning and artificial intelligence.
To get in touch with me, please connect with me on LinkedIn. Make sure to write me a message saying hi!
The content on this website is written for educational purposes. In writing the articles, I have attempted to be as correct and precise as possible. Should you find any errors, please let me know by creating an issue or pull request in this GitHub repository.
All text on this website written by me is copyrighted and may not be used without prior permission. Creating citations using content from this website is allowed if a reference is added, including an URL reference to the referenced article.
If you have any questions or remarks, feel free to get in touch.
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.
Montserrat and Source Sans are fonts licensed under the SIL Open Font License version 1.1.
Mathjax is licensed under the Apache License, Version 2.0.