25: Latent diffusion
In this final lesson of the series, Johno begins by showing us how we can convert sounds into pictures, and then take advantage of what we’ve learned in this course to generate audio! He builds and demonstrates a very effective bird-song generator using this approach.
Then Jeremy wraps up “Stable diffusion from scratch” by showing how to use the latents in a variational encoder as the “pixels” in a regular diffusion model. He also describes an intriguing new idea for students to follow up: what if you use latents for other purposes, such as a classification model? Perhaps this would open up a whole world of possibilities, such as latents-FID, latents-perceptual-loss, and new approaches to diffusion guidance!
Video
Lesson resources
- Discuss this lesson
- 02_diffusion for audio.pynb
- Riffusion: demo | repo
- Notebooks discussed: nb 29 | nb 30 | nb 31 | Johno’s Simple Diffusion for audio