Forward Diffusion

november 2025

Mathematical Foundation

The forward diffusion process transforms data $x_0$ into increasingly noisy versions $x_1, x_2, ..., x_T$.

At each timestep $t$, we add Gaussian noise according to:

$$q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t}x_{t-1}, \beta_t I)$$

where $\beta_t$ is a variance schedule that controls how much noise is added at each step.

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

# Set random seed for reproducibility
np.random.seed(42)

Variance Schedule

We define a linear schedule for $\beta_t$ from $\beta_1$ to $\beta_T$:

def linear_beta_schedule(timesteps, beta_start=0.0001, beta_end=0.02):
    """
    Linear schedule for variance parameter beta.
    
    Args:
        timesteps: Number of diffusion steps
        beta_start: Initial variance
        beta_end: Final variance
    
    Returns:
        Array of beta values
    """
    return np.linspace(beta_start, beta_end, timesteps)

# Define parameters
T = 1000
betas = linear_beta_schedule(T)

# Pre-compute useful values
alphas = 1.0 - betas
alphas_cumprod = np.cumprod(alphas)

print(f"Number of timesteps: {T}")
print(f"Beta range: [{betas[0]:.6f}, {betas[-1]:.6f}]")
print(f"Alpha_bar at T: {alphas_cumprod[-1]:.6f}")

Closed-Form Forward Process

The elegant property of forward diffusion is that we can sample $x_t$ directly from $x_0$ without iterating through all timesteps:

$$q(x_t | x_0) = \mathcal{N}(x_t; \sqrt{\bar{\alpha}_t}x_0, (1-\bar{\alpha}_t)I)$$

where $\bar{\alpha}_t = \prod_{i=1}^{t} \alpha_i$

def forward_diffusion_sample(x_0, t, alphas_cumprod):
    """
    Sample from q(x_t | x_0) using the closed-form solution.
    
    Args:
        x_0: Original data
        t: Timestep (0-indexed)
        alphas_cumprod: Cumulative product of alphas
    
    Returns:
        Noisy version of x_0 at timestep t
    """
    sqrt_alpha_bar = np.sqrt(alphas_cumprod[t])
    sqrt_one_minus_alpha_bar = np.sqrt(1 - alphas_cumprod[t])
    
    # Sample noise
    noise = np.random.randn(*x_0.shape)
    
    # Apply forward diffusion formula
    x_t = sqrt_alpha_bar * x_0 + sqrt_one_minus_alpha_bar * noise
    
    return x_t, noise

Visualization: 1D Signal

Let's visualize how a simple 1D signal degrades through the forward process:

# Create a simple signal
x = np.linspace(0, 4*np.pi, 200)
signal = np.sin(x)

# Sample at different timesteps
timesteps_to_show = [0, 100, 300, 500, 700, 999]

fig, axes = plt.subplots(2, 3, figsize=(15, 8))
axes = axes.flatten()

for idx, t in enumerate(timesteps_to_show):
    if t == 0:
        noisy_signal = signal
    else:
        noisy_signal, _ = forward_diffusion_sample(signal, t-1, alphas_cumprod)
    
    axes[idx].plot(x, noisy_signal, linewidth=1)
    axes[idx].set_title(f't = {t}', fontsize=12)
    axes[idx].set_ylim(-3, 3)
    axes[idx].grid(True, alpha=0.3)
    
plt.tight_layout()
plt.suptitle('Forward Diffusion on 1D Signal', fontsize=14, y=1.02)
plt.show()

print("\nObservation: As t increases, the signal becomes pure noise.")

Visualization: 2D Image

Forward diffusion on a 2D pattern:

# Create a simple 2D pattern
size = 64
x = np.linspace(-1, 1, size)
y = np.linspace(-1, 1, size)
X, Y = np.meshgrid(x, y)

# Create concentric circles pattern
pattern = np.sin(10 * np.sqrt(X**2 + Y**2))

# Visualize forward diffusion
timesteps_2d = [0, 100, 250, 500, 750, 999]

fig, axes = plt.subplots(2, 3, figsize=(12, 8))
axes = axes.flatten()

for idx, t in enumerate(timesteps_2d):
    if t == 0:
        noisy_pattern = pattern
    else:
        noisy_pattern, _ = forward_diffusion_sample(pattern, t-1, alphas_cumprod)
    
    im = axes[idx].imshow(noisy_pattern, cmap='RdBu_r', vmin=-3, vmax=3)
    axes[idx].set_title(f't = {t}', fontsize=11)
    axes[idx].axis('off')

plt.tight_layout()
plt.suptitle('Forward Diffusion on 2D Pattern', fontsize=14, y=1.00)
plt.show()

Key Insights

1. Monotonic Information Loss: Each timestep irreversibly adds noise

2. Closed-Form Sampling: We can jump to any timestep directly using $\bar{\alpha}_t$

3. Endpoint Behavior: At $t=T$, $x_T \approx \mathcal{N}(0, I)$ (pure noise)

4. Trainable Reverse: The reverse process learns to denoise, step by step

This forward process is deterministic given the noise, making it possible to train a neural network to reverse it.

Next Steps

- Chapter 3: Reverse Diffusion and Denoising

- Chapter 4: Training the U-Net Architecture

- Chapter 5: Sampling Strategies (DDPM, DDIM)