Diffusion Model

Creating Images From Noise

Diffusion models work by learning to reverse a destruction process. Take a clear image, gradually add random noise until it becomes pure static. Now train a model to reverse each step - to take slightly noisy images and make them slightly cleaner. Stack enough of these denoising steps together, and you can start from pure noise and end up with a realistic image.

This approach powers Stable Diffusion, DALL-E 3, Midjourney, and most other modern image generators. It turns out that learning to denoise is easier than learning to generate images directly.

Why Diffusion Dominates Image Generation

Earlier approaches like GANs (generative adversarial networks) were notoriously hard to train - two networks competing against each other, prone to mode collapse and instability. Diffusion models train with a simple objective: predict the noise that was added. This stability makes them much easier to scale.

The iterative refinement process also gives you control. You can guide generation at each step with text prompts, reference images, or other conditioning. Want to change a detail? Intervene at the right step. This flexibility is why diffusion models have become the go-to architecture for controllable image generation.

The tradeoff is speed. Running many denoising steps takes time. Lots of research focuses on reducing the number of steps needed while maintaining quality.

Creating Images From Noise

Why Diffusion Dominates Image Generation

Related Terms

More in Model Types