dynamic programming for reinforcement learning
Exploring dynamic programming for reinforcement learning.
research notes on ml topics
The author is an AI Resident at Google DeepMind, working on enhancing Gemini's cross-lingual and cross-modal transfer capabilities. His research focuses on multilingual learning and post-training. This blog documents his learning journey through short, accessible code-annotated articles on foundational concepts, written whenever time allows alongside full-time research.
Exploring dynamic programming for reinforcement learning.
Forward diffusion is the process of gradually adding Gaussian noise to data over multiple timesteps. This is the foundational concept in diffusion models like DDPM (Denoising Diffusion Probabilistic Models).
Exploring monte carlo control.
Exploring multi-armed bandits.
Exploring sampling methods.