Forward Diffusion
Mathematical Foundation
The forward diffusion process transforms data $x_0$ into increasingly noisy versions $x_1, x_2, ..., x_T$.
At each timestep $t$, we add Gaussian noise according to:
$$q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t}x_{t-1}, \beta_t I)$$
where $\beta_t$ is a variance schedule that controls how much noise is added at each step.
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
# Set random seed for reproducibility
np.random.seed(42)
Variance Schedule
We define a linear schedule for $\beta_t$ from $\beta_1$ to $\beta_T$:
def linear_beta_schedule(timesteps, beta_start=0.0001, beta_end=0.02):
"""
Linear schedule for variance parameter beta.
Args:
timesteps: Number of diffusion steps
beta_start: Initial variance
beta_end: Final variance
Returns:
Array of beta values
"""
return np.linspace(beta_start, beta_end, timesteps)
# Define parameters
T = 1000
betas = linear_beta_schedule(T)
# Pre-compute useful values
alphas = 1.0 - betas
alphas_cumprod = np.cumprod(alphas)
print(f"Number of timesteps: {T}")
print(f"Beta range: [{betas[0]:.6f}, {betas[-1]:.6f}]")
print(f"Alpha_bar at T: {alphas_cumprod[-1]:.6f}")
Closed-Form Forward Process
The elegant property of forward diffusion is that we can sample $x_t$ directly from $x_0$ without iterating through all timesteps:
$$q(x_t | x_0) = \mathcal{N}(x_t; \sqrt{\bar{\alpha}_t}x_0, (1-\bar{\alpha}_t)I)$$
where $\bar{\alpha}_t = \prod_{i=1}^{t} \alpha_i$
def forward_diffusion_sample(x_0, t, alphas_cumprod):
"""
Sample from q(x_t | x_0) using the closed-form solution.
Args:
x_0: Original data
t: Timestep (0-indexed)
alphas_cumprod: Cumulative product of alphas
Returns:
Noisy version of x_0 at timestep t
"""
sqrt_alpha_bar = np.sqrt(alphas_cumprod[t])
sqrt_one_minus_alpha_bar = np.sqrt(1 - alphas_cumprod[t])
# Sample noise
noise = np.random.randn(*x_0.shape)
# Apply forward diffusion formula
x_t = sqrt_alpha_bar * x_0 + sqrt_one_minus_alpha_bar * noise
return x_t, noise
Visualization: 1D Signal
Let's visualize how a simple 1D signal degrades through the forward process:
# Create a simple signal
x = np.linspace(0, 4*np.pi, 200)
signal = np.sin(x)
# Sample at different timesteps
timesteps_to_show = [0, 100, 300, 500, 700, 999]
fig, axes = plt.subplots(2, 3, figsize=(15, 8))
axes = axes.flatten()
for idx, t in enumerate(timesteps_to_show):
if t == 0:
noisy_signal = signal
else:
noisy_signal, _ = forward_diffusion_sample(signal, t-1, alphas_cumprod)
axes[idx].plot(x, noisy_signal, linewidth=1)
axes[idx].set_title(f't = {t}', fontsize=12)
axes[idx].set_ylim(-3, 3)
axes[idx].grid(True, alpha=0.3)
plt.tight_layout()
plt.suptitle('Forward Diffusion on 1D Signal', fontsize=14, y=1.02)
plt.show()
print("\nObservation: As t increases, the signal becomes pure noise.")
Visualization: 2D Image
Forward diffusion on a 2D pattern:
# Create a simple 2D pattern
size = 64
x = np.linspace(-1, 1, size)
y = np.linspace(-1, 1, size)
X, Y = np.meshgrid(x, y)
# Create concentric circles pattern
pattern = np.sin(10 * np.sqrt(X**2 + Y**2))
# Visualize forward diffusion
timesteps_2d = [0, 100, 250, 500, 750, 999]
fig, axes = plt.subplots(2, 3, figsize=(12, 8))
axes = axes.flatten()
for idx, t in enumerate(timesteps_2d):
if t == 0:
noisy_pattern = pattern
else:
noisy_pattern, _ = forward_diffusion_sample(pattern, t-1, alphas_cumprod)
im = axes[idx].imshow(noisy_pattern, cmap='RdBu_r', vmin=-3, vmax=3)
axes[idx].set_title(f't = {t}', fontsize=11)
axes[idx].axis('off')
plt.tight_layout()
plt.suptitle('Forward Diffusion on 2D Pattern', fontsize=14, y=1.00)
plt.show()
Key Insights
1. Monotonic Information Loss: Each timestep irreversibly adds noise
2. Closed-Form Sampling: We can jump to any timestep directly using $\bar{\alpha}_t$
3. Endpoint Behavior: At $t=T$, $x_T \approx \mathcal{N}(0, I)$ (pure noise)
4. Trainable Reverse: The reverse process learns to denoise, step by step
This forward process is deterministic given the noise, making it possible to train a neural network to reverse it.
Next Steps
- Chapter 3: Reverse Diffusion and Denoising
- Chapter 4: Training the U-Net Architecture
- Chapter 5: Sampling Strategies (DDPM, DDIM)