9: Stable Diffusion

What you need to know

Here’s what you need to know to complete this course:

  • The lesson is presented as a video, which you can jump directly to by clicking the table of contents on the right
  • Each video goes through one or more Jupyter notebooks, which you’ll need to run and experiment with to get the most out of the course
  • All information needed to complete a lesson (including links to the repo with the notebooks) is in the “lesson resources” section of the lesson page
  • Amongst the lesson resources you’ll find a “discuss this lesson” link, which will take you to a Q&A page on our forums for that particular lesson
  • The material covered in this course includes stuff that would normally only be included in post-graduate level programs. We try to present it in the clearest way possible, but you should expect to work hard and put in plenty of hours of study
  • We assume familiarity with the material in part 1 of this course. If you find yourself unsure about some of the foundational deep learning ideas refered to in the lessons, we’d suggest going back to study the lessons in part 1 that cover those ideas
  • If there’s mathematical or coding concepts that we use that you’re not comfortable with, don’t be afraid to seek out other tutorials to help fill in your gaps
  • On forums.fast.ai there are many other students you can collaborate with, and many folks are looking for study groups or study buddies. Studying in groups has been shown to be more effective for most people than studying alone
  • In many lessons we’ll include a challenge for you to complete, some of which involve trying novel research directions where you’ll be venturing into the academic unknown.

Lesson overview

This lesson starts with a tutorial on how to use pipelines in the Diffusers library to generate images. Diffusers is (in our opinion!) the best library available at the moment for image generation. It has many features and is very flexible. We explain how to use its many features, and discuss options for accessing the GPU resources needed to use the library.

We talk about some of the nifty tweaks available when using Stable Diffusion in Diffusers, and show how to use them: guidance scale (for varying the amount the prompt is used), negative prompts (for removing concepts from an image), image initialisation (for starting with an existing image), textual inversion (for adding your own concepts to generated images), Dreambooth (an alternative approach to textual inversion).

The second half of the lesson covers the key concepts involved in Stable Diffusion:

  • CLIP embeddings
  • The VAE (variational autoencoder)
  • Predicting noise with the unet
  • Removing noise with schedulers

Jeremy shows a theoretical foundation for how Stable Diffusion works, using a novel interpretation that shows an easily-understood intuition for the theory. He introduces the concept of finite differencing and analytic derivatives, using an example of training a neural network to identify pixel adjustments to make an image look more like a handwritten digit, and describes how the derivatives of such a model can provide the score needed to provide the basis of a diffusion process that generates handwritten digits.

The lesson also covers finite differencing, analytic derivatives, autoencoders, and U-Nets. Jeremy introduces the concept of creating a model that can take a sentence and return a vector of numbers representing the image, using two models: a text encoder and an image encoder. The lesson concludes with a discussion of the similarities between diffusion-based models and deep learning optimizers, suggesting new research directions.

Concepts discussed

  • Stable Diffusion
  • Hugging Face’s Diffusers library
  • Pre-trained pipelines
  • Guidance scale
  • Negative prompts
  • Image-to-image pipelines
  • Finite differencing
  • Analytic derivatives
  • Autoencoders
  • Textual inversion
  • Dreambooth
  • Latents
  • U-Nets
  • Text encoders and image encoders
  • Contrastive loss function
  • CLIP text encoder
  • Deep learning optimizers
  • Perceptual loss

Video

Lesson resources

Useful background on fast.ai courses