12 Generative AI

What is Generative AI?

Generative Artificial Intelligence refers to AI systems that can generate new content (text, images, molecules, proteins, code, etc.) that is similar to the training data distribution, but not identical.

Discriminative AI: predict labels $y$ given input $x$
Generative AI: model the data distribution $p(x)$ and generate new samples

Core Objective

Generative AI aims to approximate the true data distribution:

\begin{equation*} p_\theta(x) \approx p_{\text{data}}(x) \end{equation*}

This is usually achieved via Maximum Likelihood Estimation (MLE):

\begin{equation*} \min_\theta \; \mathbb{E}_{x \sim p_{\text{data}}}[-\log p_\theta(x)] \end{equation*}

Autoregressive Factorization

For structured data such as text or sequences, the joint probability is decomposed as:

\begin{equation*} p(x) = \prod_{i=1}^{n} p(x_i \mid x_{<i}) \end{equation*}

This autoregressive modeling strategy is the foundation of modern language models (e.g., GPT).

Neural Parameterization

Neural networks are used to parameterize conditional probabilities:

\begin{equation*} p(x_i \mid x_{<i}) = \text{Softmax}(f_\theta(x_{<i})) \end{equation*}

where $f_\theta$ is typically implemented using Transformer architectures.

Major Families of Generative Models

Autoregressive Models

Example: GPT
Advantages: stable training, explicit likelihood
Limitation: sequential generation is slow

Variational Autoencoders (VAE)

Objective:

\begin{equation*} \mathcal{L} = \mathbb{E}_{q(z \mid x)}[\log p(x \mid z)] - D_{KL}(q(z \mid x)\,\|\,p(z)) \end{equation*}

Used for latent representation learning and structured generation.

Generative Adversarial Networks (GAN)

Two-player game:

\begin{equation*} \min_G \max_D \; \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p(z)}[\log(1 - D(G(z)))] \end{equation*}

Produces sharp samples but is difficult to train.

Diffusion Models

Forward noising process:

\begin{equation*} x_t = \sqrt{\alpha_t} x_0 + \sqrt{1 - \alpha_t}\,\epsilon \end{equation*}

Reverse denoising learns $p_\theta(x_{t-1} \mid x_t)$ to generate high-quality samples.

Transformers as the Backbone

Transformers dominate generative AI because they:

Model long-range dependencies
Scale efficiently with parameters and data
Support multi-modal inputs (text, image, molecules)

From Models to Foundation Models

Evolution trend:

Single-task $\rightarrow$ multi-task
Single-modal $\rightarrow$ multi-modal
Task-specific $\rightarrow$ foundation models

Examples include GPT-style large language models.

Generative AI for Science

Scientific objects can be treated as symbolic sequences:

Molecules $\rightarrow$ graphs / SMILES
Proteins $\rightarrow$ amino acid sequences

Training Pipeline

Pretraining

Learn general structure by minimizing negative log-likelihood over massive datasets.

Supervised Fine-Tuning (SFT)

Align models with instruction-following behavior.

Reinforcement Learning from Human Feedback (RLHF)

Optimize expected reward:

\begin{equation*} \max_\theta \; \mathbb{E}_{x \sim \pi_\theta}[R(x)] \end{equation*}

Capabilities and Limitations

Strengths

Text and code generation
Pattern discovery
Large-scale hypothesis exploration

Limitations

Hallucinations
Weak numerical precision
Limited physical grounding

What is Generative AI?​

Core Objective​

Autoregressive Factorization​

Neural Parameterization​

Major Families of Generative Models​

Autoregressive Models​

Variational Autoencoders (VAE)​

Generative Adversarial Networks (GAN)​

Diffusion Models​

Transformers as the Backbone​

From Models to Foundation Models​

Generative AI for Science​

Training Pipeline​

Pretraining​

Supervised Fine-Tuning (SFT)​

Reinforcement Learning from Human Feedback (RLHF)​

Capabilities and Limitations​