跳到主要内容

10 AI for Drug Design

Why Drug Design Needs AI

Drug discovery is extremely expensive, slow, and risky. On average, developing a new drug costs over $1.3 billion and takes more than 10 years.

Drug Discovery as an AI Problem

Drug discovery can be formulated as a set of AI tasks across multiple biological scales.

Key AI Tasks

  • Target Identification – identify disease-related genes or proteins
  • Molecule Generation – generate candidate drug molecules
  • Structure & Property Prediction – predict binding, toxicity, solubility
  • Clinical Outcome Prediction – predict trial success

Multi-scale Biological Systems

Biological systems span multiple scales: molecule → cell → tissue → organ → disease.

Biological Foundations (Minimal)

English:

  • DNA stores genetic information
  • RNA transfers information
  • Proteins perform biological functions

Molecules as Data

Molecules as Sequences

English: Molecules and proteins can be represented as discrete sequences (e.g., SMILES, amino acids).

Molecules as Graphs

English: Atoms are nodes, bonds are edges → Graph Neural Networks (GNNs).

Molecules as 3D Objects

English: 3D geometry is critical for molecular function.

Generative Models for Drug Design

English: Generative models can directly create new molecules instead of selecting from databases.

Main Generative Paradigms

  • Autoregressive Models (LMs)
  • Graph Generation
  • Diffusion Models (state-of-the-art)

Diffusion Models (Key Idea)

English: Diffusion models generate data by gradually denoising random noise.

MARS: Small Molecule Drug Generation

English: MARS uses MCMC-style iterative graph editing to optimize multiple objectives.

EnzyGen: Generative Enzyme Design

English: EnzyGen is a unified generative model for enzyme sequence-structure-function co-design.

PPDiff: Protein–Protein Complex Design

English: PPDiff performs diffusion in hybrid sequence–structure space for protein complex design.

Key Challenges in AI Drug Design

English:

  • Multiple objectives must be satisfied simultaneously
  • Data is scarce and noisy
  • Novelty is hard to achieve
  • Physical and biological constraints are strong