跳到主要内容

12 Generative AI

What is Generative AI?

Generative Artificial Intelligence refers to AI systems that can generate new content (text, images, molecules, proteins, code, etc.) that is similar to the training data distribution, but not identical.

  • Discriminative AI: predict labels yy given input xx
  • Generative AI: model the data distribution p(x)p(x) and generate new samples

Core Objective

Generative AI aims to approximate the true data distribution:

pθ(x)pdata(x)\begin{equation*} p_\theta(x) \approx p_{\text{data}}(x) \end{equation*}

This is usually achieved via Maximum Likelihood Estimation (MLE):

minθ  Expdata[logpθ(x)]\begin{equation*} \min_\theta \; \mathbb{E}_{x \sim p_{\text{data}}}[-\log p_\theta(x)] \end{equation*}

Autoregressive Factorization

For structured data such as text or sequences, the joint probability is decomposed as:

p(x)=i=1np(xix<i)\begin{equation*} p(x) = \prod_{i=1}^{n} p(x_i \mid x_{<i}) \end{equation*}

This autoregressive modeling strategy is the foundation of modern language models (e.g., GPT).

Neural Parameterization

Neural networks are used to parameterize conditional probabilities:

p(xix<i)=Softmax(fθ(x<i))\begin{equation*} p(x_i \mid x_{<i}) = \text{Softmax}(f_\theta(x_{<i})) \end{equation*}

where fθf_\theta is typically implemented using Transformer architectures.

Major Families of Generative Models

Autoregressive Models

  • Example: GPT
  • Advantages: stable training, explicit likelihood
  • Limitation: sequential generation is slow

Variational Autoencoders (VAE)

Objective:

L=Eq(zx)[logp(xz)]DKL(q(zx)p(z))\begin{equation*} \mathcal{L} = \mathbb{E}_{q(z \mid x)}[\log p(x \mid z)] - D_{KL}(q(z \mid x)\,\|\,p(z)) \end{equation*}

Used for latent representation learning and structured generation.

Generative Adversarial Networks (GAN)

Two-player game:

minGmaxD  Expdata[logD(x)]+Ezp(z)[log(1D(G(z)))]\begin{equation*} \min_G \max_D \; \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p(z)}[\log(1 - D(G(z)))] \end{equation*}

Produces sharp samples but is difficult to train.

Diffusion Models

Forward noising process:

xt=αtx0+1αtϵ\begin{equation*} x_t = \sqrt{\alpha_t} x_0 + \sqrt{1 - \alpha_t}\,\epsilon \end{equation*}

Reverse denoising learns pθ(xt1xt)p_\theta(x_{t-1} \mid x_t) to generate high-quality samples.

Transformers as the Backbone

Transformers dominate generative AI because they:

  • Model long-range dependencies
  • Scale efficiently with parameters and data
  • Support multi-modal inputs (text, image, molecules)

From Models to Foundation Models

Evolution trend:

  • Single-task \rightarrow multi-task
  • Single-modal \rightarrow multi-modal
  • Task-specific \rightarrow foundation models

Examples include GPT-style large language models.

Generative AI for Science

Scientific objects can be treated as symbolic sequences:

  • Molecules \rightarrow graphs / SMILES
  • Proteins \rightarrow amino acid sequences

Training Pipeline

Pretraining

Learn general structure by minimizing negative log-likelihood over massive datasets.

Supervised Fine-Tuning (SFT)

Align models with instruction-following behavior.

Reinforcement Learning from Human Feedback (RLHF)

Optimize expected reward:

maxθ  Exπθ[R(x)]\begin{equation*} \max_\theta \; \mathbb{E}_{x \sim \pi_\theta}[R(x)] \end{equation*}

Capabilities and Limitations

Strengths

  • Text and code generation
  • Pattern discovery
  • Large-scale hypothesis exploration

Limitations

  • Hallucinations
  • Weak numerical precision
  • Limited physical grounding