Skip to content

Building Protein-Ligand Systems

PRISM automates the complex workflow of preparing protein-ligand systems for molecular dynamics simulations. This guide covers all aspects of system building.

Quick Start

prism protein.pdb ligand.mol2 -o my_system

Overview

PRISM's automated building process includes:

  1. Ligand Parameterization - Generate force field parameters for small molecules
  2. Protein Preparation - Clean and prepare protein structure
  3. System Assembly - Combine protein and ligand
  4. Solvation - Add water box and ions
  5. Topology Generation - Create GROMACS input files
  6. MDP File Creation - Generate simulation parameters

All of this happens automatically with a single command!

Quick Start

Command-Line Interface

The simplest way to build a system:

# Default: GAFF ligand FF, AMBER99SB protein, TIP3P water
prism protein.pdb ligand.mol2 -o my_system

# With different force fields
prism protein.pdb ligand.sdf -o my_system \
  --ligand-forcefield openff \
  --forcefield amber14sb \
  --water tip4p

Python API

For programmatic control:

import prism

# Method 1: High-level API (recommended)
system = prism.system("protein.pdb", "ligand.mol2", output_dir="my_system")
system.build()

# Method 2: Using PRISMBuilder directly
from prism import PRISMBuilder

builder = PRISMBuilder(
    protein_path="protein.pdb",
    ligand_path="ligand.mol2",
    output_dir="my_system",
    ligand_forcefield="gaff",
    forcefield="amber99sb",
    water_model="tip3p"
)
builder.run()

The Building Workflow

What PRISM Does Automatically

When you run prism protein.pdb ligand.mol2 -o output, PRISM executes this workflow:

1. Configuration Setup

  • Loads default or custom configuration
  • Validates force field availability
  • Sets up output directory structure

2. Ligand Force Field Generation

Depending on the chosen force field:

GAFF/GAFF2:

ligand.mol2 → antechamber → AM1-BCC charges
           → parmchk2 → missing parameters
           → ACPYPE → GROMACS format (.itp, .gro)

OpenFF:

ligand.sdf → OpenFF Toolkit → force field assignment
          → OpenFF Interchange → GROMACS format

OPLS-AA:

ligand.mol2 → LigParGen API → server-side parameterization
           → Download results → GROMACS format

CGenFF:

Downloaded files → Conversion scripts → GROMACS format

Output: LIG.amb2gmx/ or LIG.openff2gmx/ directory with: - LIG.itp - Ligand topology - LIG.gro - Ligand coordinates - atomtypes_LIG.itp - Atom type definitions - posre_LIG.itp - Position restraints

3. Protein Preparation

Intelligent Cleaning:

# What PRISM does:
- Remove HETATM records (except ligand)
- Smart metal ion handling (keep Zn2+/Ca2+/Mg2+, remove Na+/Cl-)
- Remove crystallization artifacts (GOL, EDO, PEG, etc.)
- Fix terminal residue naming (ACE, NME, NH2, etc.)
- Optional: Protonation state optimization (via Meeko/PrepWizard)

Metal Ion Handling: PRISM has three modes: - smart (default): Keep structural metals (Zn, Mg, Ca, Fe), remove buffer ions - keep_all: Keep all metals - remove_all: Remove all metals

Protonation Optimization (Optional):

# In config file
protonation:
  optimize: true
  ph: 7.4
  his_state: auto  # or HID/HIE/HIP

Output: protein_clean.pdb or protein_protonated.pdb

4. System Assembly with GROMACS

# PRISM runs these GROMACS commands internally:

# 1. Process protein topology
gmx pdb2gmx -f protein_clean.pdb -o protein_processed.gro \
  -p topol.top -ff amber99sb -water tip3p

# 2. Combine protein and ligand
# (Manual coordinate merging + topology inclusion)

# 3. Define box
gmx editconf -f complex.gro -o boxed.gro \
  -c -d 1.5 -bt cubic

# 4. Add solvent
gmx solvate -cp boxed.gro -cs spc216.gro \
  -o solv.gro -p topol.top

# 5. Add ions (neutralize + 0.15 M NaCl)
gmx genion -s ions.tpr -o solv_ions.gro -p topol.top \
  -neutral -conc 0.15

Output: GMX_PROLIG_MD/ directory with: - solv_ions.gro - Complete solvated system - topol.top - System topology

5. MDP File Generation

PRISM generates optimized MDP files for each stage:

Energy Minimization (em.mdp):

integrator = steep
emtol = 200.0
nsteps = 10000

NVT Equilibration (nvt.mdp):

integrator = md
dt = 0.002
nsteps = 250000  ; 500 ps
tcoupl = V-rescale
ref_t = 310

NPT Equilibration (npt.mdp):

pcoupl = C-rescale
ref_p = 1.0

Production MD (md.mdp):

nsteps = 250000000  ; 500 ns default
nstxout-compressed = 250000  ; save every 500 ps

6. Run Script Generation

Creates localrun.sh with checkpoint restart support:

#!/bin/bash
# Auto-generated by PRISM

# Energy Minimization
if [ -f ./em/em.gro ]; then
    echo "EM already completed"
else
    gmx grompp -f ../mdps/em.mdp ...
    gmx mdrun -deffnm ./em/em -gpu_id 0 ...
fi

# NVT, NPT, Production...


Advanced Building Options

Force Field Selection

Ligand Force Fields (8+ Options)

# GAFF - General AMBER Force Field (default)
prism protein.pdb ligand.mol2 -o output --ligand-forcefield gaff

# GAFF2 - Improved version
prism protein.pdb ligand.mol2 -o output --ligand-forcefield gaff2

# OpenFF - Modern, data-driven
prism protein.pdb ligand.sdf -o output --ligand-forcefield openff

# CGenFF - CHARMM General Force Field
prism protein.pdb ligand.mol2 -o output --ligand-forcefield cgenff \
  --forcefield-path /path/to/cgenff_download

# OPLS-AA - Via LigParGen server
prism protein.pdb ligand.mol2 -o output --ligand-forcefield opls

# MMFF - Via SwissParam
prism protein.pdb ligand.mol2 -o output --ligand-forcefield mmff

# MATCH - Via SwissParam
prism protein.pdb ligand.mol2 -o output --ligand-forcefield match

# Hybrid MMFF-MATCH
prism protein.pdb ligand.mol2 -o output --ligand-forcefield hybrid

Protein Force Fields

# List available force fields
prism --list-forcefields

# Common options:
--forcefield amber99sb      # Default, well-tested
--forcefield amber99sb-ildn # Improved side chains
--forcefield amber14sb      # Recommended for most cases
--forcefield charmm36       # For membranes/lipids
--forcefield oplsaa         # For small molecule interactions

Box Configuration

# Box shape
--box-shape cubic          # Default, simple
--box-shape dodecahedron   # ~29% fewer water molecules
--box-shape octahedron     # Alternative

# Box distance (distance from protein to edge)
--box-distance 1.0   # Minimal (faster but may have issues)
--box-distance 1.5   # Default (recommended)
--box-distance 2.0   # Conservative (for unfolding studies)
--box-distance 3.0   # Very large (for long-range effects)

# Centering
--no-center          # Don't center protein in box

Solvation and Ions

# Water model
--water tip3p   # Default, fastest
--water tip4p   # Better properties, slightly slower
--water spce    # SPC/E, good diffusion

# Salt concentration
--salt-concentration 0.0    # No salt
--salt-concentration 0.15   # Physiological (default)
--salt-concentration 0.5    # High salt

# Ion types
--positive-ion NA   # Sodium (default)
--positive-ion K    # Potassium
--negative-ion CL   # Chloride (default)

# Neutralization
--no-neutralize     # Don't neutralize (keep system charged)

Ligand Charge

# For charged ligands
--ligand-charge 0    # Neutral (default)
--ligand-charge 1    # +1 charge
--ligand-charge -1   # -1 charge
--ligand-charge -2   # -2 charge

Simulation Parameters

# Temperature
--temperature 300    # Room temperature
--temperature 310    # Body temperature (default)
--temperature 350    # High temperature

# Pressure
--pressure 1.0       # 1 bar (default)

# pH (for protonation states)
--pH 7.0            # Neutral
--pH 7.4            # Physiological (default)
--pH 5.0            # Acidic

# Production time
--production-ns 100   # Short (testing)
--production-ns 500   # Default
--production-ns 1000  # Long (1 microsecond)

# Time step
--dt 0.001          # 1 fs (for flexible bonds)
--dt 0.002          # 2 fs (default, with constraints)

# Equilibration times
--nvt-ps 500        # NVT time (default)
--npt-ps 500        # NPT time (default)

Configuration Files

Using YAML Configuration

For complex setups, use configuration files:

# Export default template
prism --export-config my_config.yaml

# Edit the file, then use it
prism protein.pdb ligand.mol2 -o output --config my_config.yaml

Example Configuration

# my_config.yaml

general:
  overwrite: false

box:
  distance: 2.0
  shape: dodecahedron
  center: true

simulation:
  temperature: 300
  pressure: 1.0
  pH: 7.4
  ligand_charge: -1
  production_time_ns: 1000
  dt: 0.002
  equilibration_nvt_time_ps: 1000
  equilibration_npt_time_ps: 1000

ions:
  neutral: true
  concentration: 0.15
  positive_ion: NA
  negative_ion: CL

# Protein preparation
protein_preparation:
  ion_handling_mode: smart  # smart, keep_all, remove_all
  metal_distance_cutoff: 5.0
  keep_crystal_water: false
  remove_crystallization_artifacts: true

# Protonation optimization (optional)
protonation:
  optimize: false
  ph: 7.4
  his_state: auto
  preserve_existing_h: false

output:
  trajectory_interval_ps: 500
  energy_interval_ps: 10
  compressed_trajectory: true

constraints:
  algorithm: lincs
  type: h-bonds

Step-by-Step Manual Building

For fine control, build step-by-step:

from prism import PRISMBuilder

# Initialize builder
builder = PRISMBuilder(
    "protein.pdb",
    "ligand.mol2",
    "output",
    ligand_forcefield="gaff"
)

# Step 1: Generate ligand force field
lig_ff_dir = builder.generate_ligand_forcefield()
print(f"Ligand FF: {lig_ff_dir}")

# Step 2: Clean protein
cleaned_protein = builder.clean_protein(
    ion_mode='smart',           # Keep structural metals
    distance_cutoff=5.0,        # Within 5 Å of protein
    keep_crystal_water=False,   # Remove crystal waters
    remove_artifacts=True       # Remove GOL, EDO, etc.
)
print(f"Cleaned: {cleaned_protein}")

# Step 3: Build GROMACS model
model_dir = builder.build_model(cleaned_protein)
print(f"Model: {model_dir}")

# Step 4: Generate MDP files
builder.generate_mdp_files()

# Step 5: Generate run script
script_path = builder.generate_localrun_script()
print(f"Script: {script_path}")

# Step 6: Save configuration
builder.save_config()

Special Cases

Charged Systems

For systems with net charge:

# Ligand with -1 charge
prism protein.pdb ligand.mol2 -o output --ligand-charge -1

# Don't neutralize (keep charge)
prism protein.pdb ligand.mol2 -o output --no-neutralize

Multiple Ligands

PRISM supports building systems with multiple ligands in a single command:

# Multiple ligands (use -pf and -lf flags)
prism -pf protein.pdb -lf lig1.mol2 -lf lig2.mol2 -o output

# CGenFF with multiple ligands (one --forcefield-path per ligand)
prism -pf protein.pdb -lf lig1.mol2 -lf lig2.mol2 -o output \
  --ligand-forcefield cgenff -ffp /path/to/cgenff1 -ffp /path/to/cgenff2

Output structure for multi-ligand builds:

output/
├── Ligand_gaff/
│   ├── LIG.amb2gmx_1/
│   └── LIG.amb2gmx_2/
└── GMX_PROLIG_MD/

Metal-Binding Proteins

For proteins with essential metals:

# config.yaml
protein_preparation:
  ion_handling_mode: keep_all  # Keep all metals
  # Or specify custom metals to keep
  keep_custom_metals: [ZN, MG, CA, FE, CU]
prism protein.pdb ligand.mol2 -o output --config config.yaml

Membrane Proteins

PRISM is optimized for soluble proteins. For membrane proteins:

  1. Use CHARMM-GUI or similar tools to build membrane
  2. Use PRISM only for ligand parameterization:
builder = PRISMBuilder("protein.pdb", "ligand.mol2", "output")
lig_ff_dir = builder.generate_ligand_forcefield()
# Then manually integrate ligand.itp into membrane system

Validation and Quality Control

Check Generated Files

import prism
import os

system = prism.system("protein.pdb", "ligand.mol2", output_dir="output")
system.build()

# List all generated files
files = system.get_output_files()
for name, path in files.items():
    exists = "✓" if os.path.exists(path) else "✗"
    print(f"{exists} {name}: {path}")

Visualize System

import mdtraj as md

# Load solvated system
traj = md.load("output/GMX_PROLIG_MD/solv_ions.gro")

# System statistics
print(f"Total atoms: {traj.n_atoms}")
print(f"Protein atoms: {traj.topology.select('protein').size}")
print(f"Ligand atoms: {traj.topology.select('resname LIG').size}")
print(f"Water molecules: {traj.topology.select('water').size // 3}")
print(f"Box vectors: {traj.unitcell_vectors[0]}")

# Count ions
na_ions = traj.topology.select("resname NA")
cl_ions = traj.topology.select("resname CL")
print(f"Na+ ions: {len(na_ions)}")
print(f"Cl- ions: {len(cl_ions)}")

Check Topology

cd output/GMX_PROLIG_MD

# Check total charge (should be near 0)
grep "qtot" topol.top | tail -1

# Count molecules
grep "^SOL" topol.top  # Water count
grep "^NA\|^CL" topol.top  # Ion count

Troubleshooting

Build Failures

"Force field not found"

# List available force fields
prism --list-forcefields

# Use an available one
prism protein.pdb ligand.mol2 -o output --forcefield amber99sb

"Cannot generate ligand parameters"

Try different force field:

# GAFF failed? Try OpenFF
prism protein.pdb ligand.sdf -o output --ligand-forcefield openff

# Or OPLS-AA
prism protein.pdb ligand.mol2 -o output --ligand-forcefield opls

"System has non-zero charge"

Check ligand charge:

prism protein.pdb ligand.mol2 -o output --ligand-charge -1

"Atom type not recognized"

This usually means ligand has unusual chemistry:

# Try OpenFF (better coverage)
prism protein.pdb ligand.sdf -o output --ligand-forcefield openff

# Or check ligand structure in visualization tool

Performance Optimization

# For high-throughput screening
prism protein.pdb ligand.mol2 -o output \
  --production-ns 100 \      # Shorter
  --box-distance 1.0 \       # Smaller box
  --box-shape dodecahedron   # Fewer waters

Special Modes

PRISM also supports advanced calculation modes that extend the basic build workflow:

# PMF calculation (steered MD + umbrella sampling)
prism protein.pdb ligand.mol2 -o output --pmf

# MM/PBSA binding energy
prism protein.pdb ligand.mol2 -o output --mmpbsa

# REST2 replica exchange
prism protein.pdb ligand.mol2 -o output --rest2

# Gaussian RESP charges
prism protein.pdb ligand.mol2 -o output --gaussian hf

See PMF Calculations, Running Simulations, and Force Fields for details.