Quick introduction to text-to-image generation using Hugging Face’s diffusers package
In this article, I will show you how to get started with text-to-image generation with stable diffusion models using Hugging Face’s diffusers package.
A while back I got access to the DALLE-2 model by OpenAI, which allows you to create stunning images from text. So, I started to play around with it and generate some pretty amazing images.
However, my credits ran out, so I decided to look for alternatives, and came across this incredible article by Hugging Face,
which explains how to run stable diffusion models using their
So let’s dive into how to generate images from text using
First things first, the steps to generate images from text with the
diffusers package are:
- Make sure you have GPU access
- Install requirements
- Enable external widgets on Google Colab (for colab notebooks)
- Login to Hugging Face with your user token
- Initialize the
- Move the pipeline to the GPU
- Run Inference with Pytorch’s
So, for this project, since I am following more or less the colab notebook by Hugging Face we will assume you have access to a colab notebook with a GPU enabled. Let’s begin!
1. Make sure you have GPU access
!nvidia-smi# My Output
Ok, great! Now that we know that we have access to a GPU let’s set up the requirements for this project.
2. Install requirements
There are 5 main requirements for this project:
diffusers==0.2.4— which is the main package for running the pipeline
transformers— Hugging Face’s package with many pre-trained models for text, audio and video
scipy— Python package for scientific computing
ftfy— Python package for handling unicode issues
ipywidgets>=7,<8— package for building widgets on notebooks
torch— Pytorch package (no need to install if you are in colab)
pillow— Python package to process images (no need to install if you are in colab)
To install everything you actually need in Google Colab, just run:
!pip install diffusers==0.2.4!pip install transformers scipy ftfy!pip install "ipywidgets>=7,<8"
3. Enable external widgets on Google Colab (for colab notebooks)
# enabling widgets (to be able to login to hugging face)from google.colab import outputoutput.enable_custom_widget_manager()
4. Login to Hugging Face with your user token
# login to huggin face (get an access token etc...)from huggingface_hub import notebook_loginnotebook_login()
You should see a widget where you will input your access token from Hugging Face. After you input it, you should see something like this:
# Expected OutputLogin successful Your token has been saved to /root/.huggingface/token Authenticated through git-credential store but this isn't the helper defined on your machine. You might have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal in case you want to set this credential helper as the default git config --global credential.helper store
5. Initialize the
import torchfrom diffusers import StableDiffusionPipelinepipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True)
Here, just as in their colab notebook , we are using the
v1-4 model for which we will download the weights. Once that is done we can move on to the next step. Feel free to try out the other models for comparison!
6. Move the pipeline to the GPU
pipe = pipe.to("cuda")
7. Run Inference with Pytorch’s
from torch import autocastprompt = "photo of a panda surfing"with autocast("cuda"):
image = pipe(prompt)["sample"] image.save(f"panda_surfer.png")image
As we can clearly see, the results are incredible. Evidently you will find some variability on the results you get, but there are parameters you can tweak like
guidance_scale, number of steps and setting random seeds (for deterministic outputs) that should help you get more consistent results.