A tutorial for the Deep Dream project from Google AI
In this project, we will take an input image and apply some hallucination effects to it using generative models. This tutorial is based on this blog post by Google AI and Keras’ implementation. It requires some basic understanding of Machine Learning and Deep Neural Networks.
Level: Intermediate — Advanced
All code references are provided at the end. The author modified the original codes and added some extra implementation to fit this tutorial.
(Don’t worry! For the remaining of this tutorial, I will not be testing the model on my dog anymore.)
For advanced features like controlling dreams with a guidance image, see the Going Further session.
To accomplish this task, we will:
- Process an input image
- Feed this image through a pre-trained Image Classification model
- Take the output of some hidden layers and “amplify” the activation signals
- Re-construct a new image with these neurons amplified
The biggest technical challenge is in “amplifying” the activation signals. Since the model is already trained to detect and classify images, we can safely assume that the activation of hidden layers can capture important information about the image: shape, edges, context,…
That means the model may “see” things that are not visible to us. What we can do is to “amplify” these signals to the original picture, and feed it into the model again in an iterative process. As our image gains more “hidden contexts”, the model can receive those new signals and discover even contexts, creating a feedback loop.
The procedure at this step is:
- Create a new model to extract features from some hidden layers
- Feed an input image through the model
- Compute the activation signal from these features
- Enhance the input image using these signals
- Repeat step two
I will be using TensorFlow/Keras for this task. One advantage of TF over PyTorch (at the time of this article) is that one can easily obtain the output of the hidden layers with Keras. Thus, constructing a new model to extract features becomes a trivial task.
In general, it is advantageous to be fluent in both frameworks and understand which one is best to use for your project.
This tutorial runs on Python3. I use Anaconda for Windows to manage and install all dependencies.
Pre- and post-process
We apply the same preprocess that was used during training.
- Preprocess: Load and normalize the input image to between [-1, 1]
- Postprocess: Convert the range [-1, 1] to [0, 255] and data type to uint8
Defining the Model
We will use InceptionV3 as the baseline model.
For this task, we will construct a new model that outputs the activations of some hidden layers from our baseline model.
How can I tell which layers to choose?
Method 1: Use the
summary() method in TF to get a list of layers’ names.
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) [(None, None, None, 0 
3)]conv2d (Conv2D) (None, None, None, 864 ['input_1']
32)batch_normalization (BatchNorm (None, None, None, 96 ['conv2d']
alization) 32)activation (Activation) (None, None, None, 0 ['batch_normalization']
Method 2: Visualize with Netron (this is my preferred method):
- Load and inspect the model with Netron.
- There are some “traffic intersections” in this model, look like they should contain useful signals, let’s use these.
- Each layer of the network learns the image at a different level of abstraction. Generally, the first layers are more sensitive to basic features such as edges, color, and shapes. While the deeper layers may contain more information about objects, contexts,…
Defining Loss Function
This is where things get interesting. We will use the following loss function:
y_i’s are the activation outputs of the feature extractor model, and
d_i's are the sizes (the product of dimensions) of these outputs.
Why use this loss function?
- We want to capture where the neurons “activate” the most
- The norm of feature maps corresponds to the signals from these layers
- A large loss value means the feature maps detected lots of “hidden contexts” in the image
- We normalize with respect to the output sizes so that all chosen layers can contribute equally to the loss
Defining Update Rule
In a typical Deep Learning problem, we will use Gradient Descent to minimize the objective loss. However, it is NOT the case here:
- Minimizing the loss function means minimizing the norm of output activations. That is not our true objective.
- The activation signals are what we want to inject into our image (the things that the model sees, but we can’t)
- So we want to keep the model’s weights constant. We will only enhance our input image using the activation signals.
→ Gradient Ascent!
Putting It All Together
It is recommended in the original blog by Google that we should iteratively apply our algorithm to the image input, and apply some zooming after each iteration.
And here are some results:
To control how the model dreams, we can
- Modify the VARIABLES: Gradient step size, number of zooming steps, the scale of each zoom, and number of iterations in each step
- Choose a different set of layers for feature extraction: Again we can utilize Netron and pick out some candidate layers
- Modify the loss function: Is the normalized norm the best function to capture hidden signals from the model?
- Use a guide image for our dream
Using a Guide Image
Beyond relying on the activation signals, we can also use a guide image as the “GPS” for our iterations. At each step, the hidden signals from the guide image are also injected into our image.
To achieve this task, we need to make the following changes:
- Load a guide image
- Resize both images to the same resolution
- Modify the loss function: We will instead compute the dot product of activations between images.
- Rationale: the model may “see” things in both images that are not visible to our eyes. A high loss means there are strong signals from the guide image boosting the dot product. Hence, with gradient ascent, we are adding extra “abstraction” from the guide image into our original image.
This project is a nice example of generative models in Machine Learning. Instead of modifying the weights to minimize the loss, we enhance the input image using the gradient to maximize this loss. As we iteratively repeat this process, the image can gradually gain extra signals from the model, previously hidden from our eyes.
Hopefully, you will have fun following along with this tutorial and be able to create some beautiful midsummer night’s dreams.
Blog Post by Google AI:
Keras Deep Dream: