A Gentle Introduction to GANs and its Loss Function!

The most interesting idea in the last 10 years in Machine Learning — Yann LeCun

Devansh Pratap Singh
SRM MIC

--

GANs

Before diving into the deep insights, let’s have a quick detailed walkthrough of the working of GAN.

Generative Adversarial Networks are deep neural net architectures consisting of two neural networks competing against each other, hence the name “Adversarial”. The networks can be CNN, DNN, or simple Vanilla networks. The Neural Networks are trained in an adversarial manner to generate data to mimic some distribution.

Machine Learning models used in GAN

#) Discriminative Model: This model discriminates between two different classes of data. It can be considered as a binary classifier like an email is spam or not spam, the face is FAKE or NOT.

#) Generative Model: A generative model G to be trained on training data x, sampled from some true distribution D gives some standard random distribution Z produces a distribution D’ which is close to D according to some closeness metric. Here, z~Z maps to a sample G(z)~D’ where D’≈D.

In very simple words this model “generates” data that seems similar to the original data.

Now let’s go through the architecture of GAN!

Here we have two models, discriminative and generative, the role of the Discriminator is to distinguish between two different classes as fake or not fake, i.e., 0 or 1. It discriminates if the data is from Real (1) or from Fake Samples (0), which has been “generated” by the Generator. The Generator takes the distribution z from Latent Space(Z) and generates D’ which should be similar to distribution D (Real Samples), such that it can fool the Discriminator.

If the decision of the Discriminator is wrong, the weights and biases of both the models, Discriminator and Generator, are adjusted in Backpropagation (Fine Tune Training). The role of G is to transform z into G(z) such that the Discriminator fails to discriminate between G(z) and the samples(x) from Real Samples(D) and outputs the value of 0.5

Learning (Backpropagation) in GAN!

Learning is done by changing (w, b) of D and G models through backpropagation.

The important point to keep in mind is that each of them is held constant while training the other.

Training the Discriminator: The Discriminator has 2 output nodes, 0 and 1, for distinguishing artificial instances (generated by the Generator) from real instances (instances from the known datasets). We feed the random noise vector to the Generator which then generates fake images and they are fed to the Discriminator along with the real images to distinguish them as fake or real. It is a simple classification in CNN with 2 output nodes.

Training the Generator: While training the Generator, the Discriminator remains fixed, i.e. the weights and biases of the Discriminator don’t change. The noise is z from Z distribution and we aim to convert it into a fake image G(z), similar to real image(x). We intend to fool the discriminator such that it outputs y=1(y=1 is for real images) for generated data(y=0). If it fails to do so, the Loss Function is calculated and is backpropagated, hence the weights and biases are adjusted accordingly. The Loss Function reduces with further iterations. If the Generator has achieved its target, the Discriminator will output 0.5(it’ll be confused whether it’s a fake or real image, i.e. it’s y=0 or y=1) so the GAN has achieved the objective i.e. D’≈D, this similarity is measured in terms of L1 or L2 norm.

Mathematical Derivation for Loss Function of GAN from scratch!

Basic Conventions-

When G(z) is fed in Discriminator, the output is D(G(z)) ∈[0,1].

The highlighted ones are important, try to keep them in mind!!

We know what Binary cross-entropy is!

Discriminator:

(eq. A) and (eq. B) should be maximized so that the Discriminator correctly classifies fake vs real datasets.

So, if you maximize [ log(D(x)) ], it’ll force D(x) to be 1. And if you maximize

[log(1-D(G(x))) ], it’ll force D(G(z)) to be 0.

Thus, the maximization term for the Discriminator is-

Intuitively we can observe that D(x) is maximum at 1, and we want the discriminator to output 1 for D(x). Likewise, D(G(z)) is maximum at 0 and that’s what we want the discriminator to output for D(G(z)).

Generator:

The Generator aims to make the Discriminator produce [ D(G(z)) = 1], i.e. the Generator will be able to fool the Discriminator as fake image as real!

We know that for Generator y=0 and y^=D(G(z))

When we put these two terms in the Binary-Cross Entropy equation, we get,

We want the D(G(z)) to be 1, in the second term of (eq. C), if we minimize [log(1-D(G(z))) ], we are forcing D(G(z)) to get closer to 1, which here is our target to achieve, i.e. D(G(z))=1 so that it fools the discriminator by producing samples very close to the original dataset.

Thus, this is the objective function, { log(D(x)) + log(1-D(G(x)))}

For which the main objective of the Discriminator is to maximize it and that for the Generator is to minimize it.

So if we write both of these equations together, we write it as:

When G is in consideration, only the second term will be considered, and for D, both the terms will be considered.

The above equation is derived for just one instance of x that belongs to X, so now let’s write the equation for all the instances, the original equation given by Ian Goodfellow.

The loss function for GAN

Conclusion:

So, in this blog we have discussed:

  • GANs and the ML models that it uses.
  • An intuitive working of GANs.
  • Derivation of loss function for GAN from scratch.

This article aimed to provide an overall intuition behind the development of the Generative Adversarial Networks. Hopefully, it gave you a better feel for GANs and it’s loss function. Thanks for reading!

--

--