Building Protein-Ligand Systems¶
PRISM automates the complex workflow of preparing protein-ligand systems for molecular dynamics simulations. This guide covers all aspects of system building.
Overview¶
PRISM's automated building process includes:
- Ligand Parameterization - Generate force field parameters for small molecules
- Protein Preparation - Clean and prepare protein structure
- System Assembly - Combine protein and ligand
- Solvation - Add water box and ions
- Topology Generation - Create GROMACS input files
- MDP File Creation - Generate simulation parameters
All of this happens automatically with a single command!
Quick Start¶
Command-Line Interface¶
The simplest way to build a system:
# Default: GAFF ligand FF, AMBER99SB protein, TIP3P water
prism protein.pdb ligand.mol2 -o my_system
# With different force fields
prism protein.pdb ligand.sdf -o my_system \
--ligand-forcefield openff \
--forcefield amber14sb \
--water tip4p
Python API¶
For programmatic control:
import prism
# Method 1: High-level API (recommended)
system = prism.system("protein.pdb", "ligand.mol2", output_dir="my_system")
system.build()
# Method 2: Using PRISMBuilder directly
from prism import PRISMBuilder
builder = PRISMBuilder(
protein_path="protein.pdb",
ligand_path="ligand.mol2",
output_dir="my_system",
ligand_forcefield="gaff",
forcefield="amber99sb",
water_model="tip3p"
)
builder.run()
The Building Workflow¶
What PRISM Does Automatically¶
When you run prism protein.pdb ligand.mol2 -o output, PRISM executes this workflow:
1. Configuration Setup¶
- Loads default or custom configuration
- Validates force field availability
- Sets up output directory structure
2. Ligand Force Field Generation¶
Depending on the chosen force field:
GAFF/GAFF2:
ligand.mol2 → antechamber → AM1-BCC charges
→ parmchk2 → missing parameters
→ ACPYPE → GROMACS format (.itp, .gro)
OpenFF:
OPLS-AA:
CGenFF:
Output: LIG.amb2gmx/ or LIG.openff2gmx/ directory with:
- LIG.itp - Ligand topology
- LIG.gro - Ligand coordinates
- atomtypes_LIG.itp - Atom type definitions
- posre_LIG.itp - Position restraints
3. Protein Preparation¶
Intelligent Cleaning:
# What PRISM does:
- Remove HETATM records (except ligand)
- Smart metal ion handling (keep Zn2+/Ca2+/Mg2+, remove Na+/Cl-)
- Remove crystallization artifacts (GOL, EDO, PEG, etc.)
- Fix terminal residue naming (ACE, NME, NH2, etc.)
- Optional: Protonation state optimization (via Meeko/PrepWizard)
Metal Ion Handling:
PRISM has three modes:
- smart (default): Keep structural metals (Zn, Mg, Ca, Fe), remove buffer ions
- keep_all: Keep all metals
- remove_all: Remove all metals
Protonation Optimization (Optional):
Output: protein_clean.pdb or protein_protonated.pdb
4. System Assembly with GROMACS¶
# PRISM runs these GROMACS commands internally:
# 1. Process protein topology
gmx pdb2gmx -f protein_clean.pdb -o protein_processed.gro \
-p topol.top -ff amber99sb -water tip3p
# 2. Combine protein and ligand
# (Manual coordinate merging + topology inclusion)
# 3. Define box
gmx editconf -f complex.gro -o boxed.gro \
-c -d 1.5 -bt cubic
# 4. Add solvent
gmx solvate -cp boxed.gro -cs spc216.gro \
-o solv.gro -p topol.top
# 5. Add ions (neutralize + 0.15 M NaCl)
gmx genion -s ions.tpr -o solv_ions.gro -p topol.top \
-neutral -conc 0.15
Output: GMX_PROLIG_MD/ directory with:
- solv_ions.gro - Complete solvated system
- topol.top - System topology
5. MDP File Generation¶
PRISM generates optimized MDP files for each stage:
Energy Minimization (em.mdp):
NVT Equilibration (nvt.mdp):
NPT Equilibration (npt.mdp):
Production MD (md.mdp):
6. Run Script Generation¶
Creates localrun.sh with checkpoint restart support:
#!/bin/bash
# Auto-generated by PRISM
# Energy Minimization
if [ -f ./em/em.gro ]; then
echo "EM already completed"
else
gmx grompp -f ../mdps/em.mdp ...
gmx mdrun -deffnm ./em/em -gpu_id 0 ...
fi
# NVT, NPT, Production...
Advanced Building Options¶
Force Field Selection¶
Ligand Force Fields (8+ Options)¶
# GAFF - General AMBER Force Field (default)
prism protein.pdb ligand.mol2 -o output --ligand-forcefield gaff
# GAFF2 - Improved version
prism protein.pdb ligand.mol2 -o output --ligand-forcefield gaff2
# OpenFF - Modern, data-driven
prism protein.pdb ligand.sdf -o output --ligand-forcefield openff
# CGenFF - CHARMM General Force Field
prism protein.pdb ligand.mol2 -o output --ligand-forcefield cgenff \
--forcefield-path /path/to/cgenff_download
# OPLS-AA - Via LigParGen server
prism protein.pdb ligand.mol2 -o output --ligand-forcefield opls
# MMFF - Via SwissParam
prism protein.pdb ligand.mol2 -o output --ligand-forcefield mmff
# MATCH - Via SwissParam
prism protein.pdb ligand.mol2 -o output --ligand-forcefield match
# Hybrid MMFF-MATCH
prism protein.pdb ligand.mol2 -o output --ligand-forcefield hybrid
Protein Force Fields¶
# List available force fields
prism --list-forcefields
# Common options:
--forcefield amber99sb # Default, well-tested
--forcefield amber99sb-ildn # Improved side chains
--forcefield amber14sb # Recommended for most cases
--forcefield charmm36 # For membranes/lipids
--forcefield oplsaa # For small molecule interactions
Box Configuration¶
# Box shape
--box-shape cubic # Default, simple
--box-shape dodecahedron # ~29% fewer water molecules
--box-shape octahedron # Alternative
# Box distance (distance from protein to edge)
--box-distance 1.0 # Minimal (faster but may have issues)
--box-distance 1.5 # Default (recommended)
--box-distance 2.0 # Conservative (for unfolding studies)
--box-distance 3.0 # Very large (for long-range effects)
# Centering
--no-center # Don't center protein in box
Solvation and Ions¶
# Water model
--water tip3p # Default, fastest
--water tip4p # Better properties, slightly slower
--water spce # SPC/E, good diffusion
# Salt concentration
--salt-concentration 0.0 # No salt
--salt-concentration 0.15 # Physiological (default)
--salt-concentration 0.5 # High salt
# Ion types
--positive-ion NA # Sodium (default)
--positive-ion K # Potassium
--negative-ion CL # Chloride (default)
# Neutralization
--no-neutralize # Don't neutralize (keep system charged)
Ligand Charge¶
# For charged ligands
--ligand-charge 0 # Neutral (default)
--ligand-charge 1 # +1 charge
--ligand-charge -1 # -1 charge
--ligand-charge -2 # -2 charge
Simulation Parameters¶
# Temperature
--temperature 300 # Room temperature
--temperature 310 # Body temperature (default)
--temperature 350 # High temperature
# Pressure
--pressure 1.0 # 1 bar (default)
# pH (for protonation states)
--pH 7.0 # Neutral
--pH 7.4 # Physiological (default)
--pH 5.0 # Acidic
# Production time
--production-ns 100 # Short (testing)
--production-ns 500 # Default
--production-ns 1000 # Long (1 microsecond)
# Time step
--dt 0.001 # 1 fs (for flexible bonds)
--dt 0.002 # 2 fs (default, with constraints)
# Equilibration times
--nvt-ps 500 # NVT time (default)
--npt-ps 500 # NPT time (default)
Configuration Files¶
Using YAML Configuration¶
For complex setups, use configuration files:
# Export default template
prism --export-config my_config.yaml
# Edit the file, then use it
prism protein.pdb ligand.mol2 -o output --config my_config.yaml
Example Configuration¶
# my_config.yaml
general:
overwrite: false
box:
distance: 2.0
shape: dodecahedron
center: true
simulation:
temperature: 300
pressure: 1.0
pH: 7.4
ligand_charge: -1
production_time_ns: 1000
dt: 0.002
equilibration_nvt_time_ps: 1000
equilibration_npt_time_ps: 1000
ions:
neutral: true
concentration: 0.15
positive_ion: NA
negative_ion: CL
# Protein preparation
protein_preparation:
ion_handling_mode: smart # smart, keep_all, remove_all
metal_distance_cutoff: 5.0
keep_crystal_water: false
remove_crystallization_artifacts: true
# Protonation optimization (optional)
protonation:
optimize: false
ph: 7.4
his_state: auto
preserve_existing_h: false
output:
trajectory_interval_ps: 500
energy_interval_ps: 10
compressed_trajectory: true
constraints:
algorithm: lincs
type: h-bonds
Step-by-Step Manual Building¶
For fine control, build step-by-step:
from prism import PRISMBuilder
# Initialize builder
builder = PRISMBuilder(
"protein.pdb",
"ligand.mol2",
"output",
ligand_forcefield="gaff"
)
# Step 1: Generate ligand force field
lig_ff_dir = builder.generate_ligand_forcefield()
print(f"Ligand FF: {lig_ff_dir}")
# Step 2: Clean protein
cleaned_protein = builder.clean_protein(
ion_mode='smart', # Keep structural metals
distance_cutoff=5.0, # Within 5 Å of protein
keep_crystal_water=False, # Remove crystal waters
remove_artifacts=True # Remove GOL, EDO, etc.
)
print(f"Cleaned: {cleaned_protein}")
# Step 3: Build GROMACS model
model_dir = builder.build_model(cleaned_protein)
print(f"Model: {model_dir}")
# Step 4: Generate MDP files
builder.generate_mdp_files()
# Step 5: Generate run script
script_path = builder.generate_localrun_script()
print(f"Script: {script_path}")
# Step 6: Save configuration
builder.save_config()
Special Cases¶
Charged Systems¶
For systems with net charge:
# Ligand with -1 charge
prism protein.pdb ligand.mol2 -o output --ligand-charge -1
# Don't neutralize (keep charge)
prism protein.pdb ligand.mol2 -o output --no-neutralize
Multiple Ligands¶
PRISM supports building systems with multiple ligands in a single command:
# Multiple ligands (use -pf and -lf flags)
prism -pf protein.pdb -lf lig1.mol2 -lf lig2.mol2 -o output
# CGenFF with multiple ligands (one --forcefield-path per ligand)
prism -pf protein.pdb -lf lig1.mol2 -lf lig2.mol2 -o output \
--ligand-forcefield cgenff -ffp /path/to/cgenff1 -ffp /path/to/cgenff2
Output structure for multi-ligand builds:
Metal-Binding Proteins¶
For proteins with essential metals:
# config.yaml
protein_preparation:
ion_handling_mode: keep_all # Keep all metals
# Or specify custom metals to keep
keep_custom_metals: [ZN, MG, CA, FE, CU]
Membrane Proteins¶
PRISM is optimized for soluble proteins. For membrane proteins:
- Use CHARMM-GUI or similar tools to build membrane
- Use PRISM only for ligand parameterization:
builder = PRISMBuilder("protein.pdb", "ligand.mol2", "output")
lig_ff_dir = builder.generate_ligand_forcefield()
# Then manually integrate ligand.itp into membrane system
Validation and Quality Control¶
Check Generated Files¶
import prism
import os
system = prism.system("protein.pdb", "ligand.mol2", output_dir="output")
system.build()
# List all generated files
files = system.get_output_files()
for name, path in files.items():
exists = "✓" if os.path.exists(path) else "✗"
print(f"{exists} {name}: {path}")
Visualize System¶
import mdtraj as md
# Load solvated system
traj = md.load("output/GMX_PROLIG_MD/solv_ions.gro")
# System statistics
print(f"Total atoms: {traj.n_atoms}")
print(f"Protein atoms: {traj.topology.select('protein').size}")
print(f"Ligand atoms: {traj.topology.select('resname LIG').size}")
print(f"Water molecules: {traj.topology.select('water').size // 3}")
print(f"Box vectors: {traj.unitcell_vectors[0]}")
# Count ions
na_ions = traj.topology.select("resname NA")
cl_ions = traj.topology.select("resname CL")
print(f"Na+ ions: {len(na_ions)}")
print(f"Cl- ions: {len(cl_ions)}")
Check Topology¶
cd output/GMX_PROLIG_MD
# Check total charge (should be near 0)
grep "qtot" topol.top | tail -1
# Count molecules
grep "^SOL" topol.top # Water count
grep "^NA\|^CL" topol.top # Ion count
Troubleshooting¶
Build Failures¶
"Force field not found"¶
# List available force fields
prism --list-forcefields
# Use an available one
prism protein.pdb ligand.mol2 -o output --forcefield amber99sb
"Cannot generate ligand parameters"¶
Try different force field:
# GAFF failed? Try OpenFF
prism protein.pdb ligand.sdf -o output --ligand-forcefield openff
# Or OPLS-AA
prism protein.pdb ligand.mol2 -o output --ligand-forcefield opls
"System has non-zero charge"¶
Check ligand charge:
"Atom type not recognized"¶
This usually means ligand has unusual chemistry:
# Try OpenFF (better coverage)
prism protein.pdb ligand.sdf -o output --ligand-forcefield openff
# Or check ligand structure in visualization tool
Performance Optimization¶
# For high-throughput screening
prism protein.pdb ligand.mol2 -o output \
--production-ns 100 \ # Shorter
--box-distance 1.0 \ # Smaller box
--box-shape dodecahedron # Fewer waters
Special Modes¶
PRISM also supports advanced calculation modes that extend the basic build workflow:
# PMF calculation (steered MD + umbrella sampling)
prism protein.pdb ligand.mol2 -o output --pmf
# MM/PBSA binding energy
prism protein.pdb ligand.mol2 -o output --mmpbsa
# REST2 replica exchange
prism protein.pdb ligand.mol2 -o output --rest2
# Gaussian RESP charges
prism protein.pdb ligand.mol2 -o output --gaussian hf
See PMF Calculations, Running Simulations, and Force Fields for details.