Output Files Guide¶
Understanding PRISM's output structure is essential for analysis and troubleshooting. This guide details all files generated during system building and simulation.
Quick Start
Output Directory Structure¶
output_directory/
├── GMX_PROLIG_MD/ # Main simulation directory
│ ├── solv_ions.gro # Solvated system structure
│ ├── topol.top # System topology
│ ├── index.ndx # Index groups
│ ├── em/ # Energy minimization
│ ├── nvt/ # NVT equilibration
│ ├── npt/ # NPT equilibration
│ └── prod/ # Production MD
├── LIG.amb2gmx/ # Ligand force field (GAFF)
│ ├── LIG.gro # Ligand structure
│ ├── LIG.itp # Ligand topology
│ ├── LIG.top # Full topology
│ ├── atomtypes_LIG.itp # Atom type definitions
│ └── posre_LIG.itp # Position restraints
├── mdps/ # MDP parameter files
│ ├── em.mdp # Energy minimization
│ ├── nvt.mdp # NVT equilibration
│ ├── npt.mdp # NPT equilibration
│ └── md.mdp # Production MD
├── forcefield/ # Temporary force field files
├── protein_clean.pdb # Cleaned protein structure
└── prism_config.yaml # Configuration used
Core System Files¶
solv_ions.gro¶
The complete solvated system with ions:
Protein-Ligand Complex in Water
47593
1MET N 1 2.345 1.234 0.567
1MET CA 2 2.456 1.345 0.678
...
47591SOL HW2 47593 10.234 9.876 8.543
10.12345 10.12345 10.12345
Format: GROMACS coordinate file - Line 1: Title - Line 2: Number of atoms - Lines 3-n: Atom information - Last line: Box vectors
Usage:
import mdtraj as md
# Load structure
structure = md.load("GMX_PROLIG_MD/solv_ions.gro")
print(f"Atoms: {structure.n_atoms}")
print(f"Box: {structure.unitcell_vectors}")
topol.top¶
Master topology file:
; System topology for GROMACS
; Generated by PRISM
#include "amber99sb.ff/forcefield.itp"
; Ligand parameters
#include "../LIG.amb2gmx/atomtypes_LIG.itp"
#include "../LIG.amb2gmx/LIG.itp"
; Water and ions
#include "amber99sb.ff/tip3p.itp"
#include "amber99sb.ff/ions.itp"
[ system ]
Protein-Ligand Complex
[ molecules ]
Protein_chain_A 1
LIG 1
SOL 14523
NA 42
CL 38
Components: - Force field includes - Ligand parameters - Molecule list with counts
Ligand Force Field Files¶
LIG.itp (GAFF)¶
Ligand topology with all parameters:
[ moleculetype ]
; Name nrexcl
LIG 3
[ atoms ]
; nr type resnr residue atom cgnr charge mass
1 c3 1 LIG C1 1 -0.094100 12.010
2 c3 1 LIG C2 2 0.142900 12.010
[ bonds ]
; ai aj funct c0 c1
1 2 1 0.15375 259408.0
[ angles ]
; ai aj ak funct c0 c1
1 2 3 1 109.50 418.40
[ dihedrals ]
; ai aj ak al funct c0 c1 c2 c3 c4 c5
1 2 3 4 9 0.0 0.65084 3
#ifdef POSRES
#include "posre_LIG.itp"
#endif
LIG.gro¶
Ligand coordinates:
atomtypes_LIG.itp¶
Atom type definitions:
[ atomtypes ]
; name at.num mass charge ptype sigma epsilon
c3 6 12.010 0.000 A 0.33996 0.45773
ca 6 12.010 0.000 A 0.33996 0.35982
MDP Parameter Files¶
em.mdp (Energy Minimization)¶
; Energy Minimization Parameters
integrator = steep ; Steepest descent
emtol = 200.0 ; kJ/mol/nm
emstep = 0.01 ; Step size
nsteps = 10000 ; Maximum steps
; Output
nstxout = 100 ; Coordinates
nstenergy = 100 ; Energies
nstlog = 100 ; Log file
nvt.mdp (NVT Equilibration)¶
; NVT Equilibration
integrator = md ; Molecular dynamics
dt = 0.002 ; 2 fs
nsteps = 250000 ; 500 ps
; Temperature coupling
tcoupl = V-rescale ; Thermostat
tc-grps = Protein Non-Protein
tau_t = 0.1 0.1 ; Coupling time
ref_t = 310 310 ; Target temperature
; Constraints
constraints = h-bonds ; Hydrogen bonds
constraint_algorithm = lincs
npt.mdp (NPT Equilibration)¶
; NPT Equilibration
pcoupl = C-rescale ; Barostat
pcoupltype = isotropic ; Coupling type
tau_p = 1.0 ; Coupling time
ref_p = 1.0 ; Target pressure
compressibility = 4.5e-5 ; Water compressibility
; Continue from NVT
continuation = yes ; Continue from NVT
gen_vel = no ; Don't generate velocities
md.mdp (Production)¶
; Production MD
nsteps = 250000000 ; 500 ns
dt = 0.002 ; 2 fs
; Output control
nstxout-compressed = 250000 ; 500 ps
compressed-x-grps = System
nstenergy = 5000 ; 10 ps
nstlog = 5000 ; 10 ps
Simulation Output Files¶
Energy Minimization¶
em/
├── em.tpr # Binary run input file
├── em.gro # Minimized structure
├── em.edr # Energy file
├── em.log # Log file
└── em.trr # Full trajectory (optional)
Check minimization:
# View final energy
grep "Potential Energy" em/em.log | tail -1
# Extract energy profile
gmx energy -f em/em.edr -o em_energy.xvg
Equilibration Files¶
nvt/
├── nvt.tpr # Run input
├── nvt.gro # Final structure
├── nvt.edr # Energies
├── nvt.log # Log file
├── nvt.cpt # Checkpoint
└── nvt.trr # Trajectory (optional)
npt/
├── npt.tpr
├── npt.gro
├── npt.edr
├── npt.log
├── npt.cpt
└── npt.trr
Production Files¶
prod/
├── md.tpr # Run input
├── md.xtc # Compressed trajectory
├── md.gro # Final structure
├── md.edr # Energy data
├── md.log # Simulation log
├── md.cpt # Checkpoint file
└── md_pullx.xvg # Pull data (if applicable)
File Formats Explained¶
TPR Files¶
Binary run input files containing: - Complete system topology - Initial coordinates - Simulation parameters - Velocities (if present)
Usage:
# View TPR contents
gmx dump -s md.tpr | less
# Extract structure
gmx editconf -f md.tpr -o structure.pdb
XTC/TRR Trajectories¶
XTC: Compressed trajectory (positions only) - Smaller file size - Lossy compression - Standard for analysis
TRR: Full trajectory (positions, velocities, forces) - Larger files - Complete information - Needed for some analyses
# Load trajectories
import mdtraj as md
# XTC (compressed)
traj = md.load("prod/md.xtc", top="prod/md.tpr")
# TRR (full precision)
traj_full = md.load("nvt/nvt.trr", top="nvt/nvt.tpr")
EDR Energy Files¶
Binary files containing energies and system properties:
# Extract properties
gmx energy -f md.edr << EOF
Potential
Kinetic-En.
Total-Energy
Temperature
Pressure
Box-X
Box-Y
Box-Z
EOF
# Convert to text
gmx energy -f md.edr -o energy.xvg
CPT Checkpoint Files¶
Binary checkpoint for restarting simulations:
# View checkpoint info
gmx check -f md.cpt
# Restart from checkpoint
gmx mdrun -s md.tpr -cpi md.cpt -deffnm md -append
Log Files¶
Simulation Logs¶
Example log file structure:
Log file opened on Mon Sep 1 10:00:00 2024
GROMACS version: 2024.1
Precision: mixed
Memory model: 64 bit
MPI library: thread_mpi
Running on 1 node with total 12 cores, 24 logical cores, 1 compatible GPU
Using 1 MPI thread and 10 OpenMP threads
Step Time
0 0.00000
Energies (kJ/mol)
Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14
1.23456e+03 2.34567e+03 3.45678e+02 4.56789e+03 5.67890e+04
Performance: 45.2 ns/day
Parse performance:
def parse_performance(log_file):
with open(log_file) as f:
for line in f:
if "Performance:" in line:
ns_per_day = float(line.split()[1])
return ns_per_day
return None
perf = parse_performance("prod/md.log")
print(f"Speed: {perf} ns/day")
Configuration Files¶
prism_config.yaml¶
Saved configuration for reproducibility:
general:
overwrite: false
box:
distance: 1.5
shape: cubic
center: true
simulation:
temperature: 310
pressure: 1.0
pH: 7.0
ligand_charge: 0
production_time_ns: 500
forcefield:
protein: amber14sb
ligand: gaff
water: tip3p
Analysis Output Files¶
contact_analysis.html¶
Interactive visualization with: - 2D/3D molecular viewer - Contact frequency data - Export capabilities
Analysis Data Files¶
analysis/
├── rmsd.xvg # RMSD over time
├── rmsf.xvg # RMSF per residue
├── energy.xvg # Energy components
├── contacts.csv # Contact analysis
└── hbonds.dat # Hydrogen bonds
Working with Output Files¶
Check File Integrity¶
import os
from pathlib import Path
def check_output_files(output_dir):
"""Check if all expected files exist"""
required_files = {
'GMX_PROLIG_MD/solv_ions.gro': 'Solvated system',
'GMX_PROLIG_MD/topol.top': 'Topology',
'mdps/em.mdp': 'EM parameters',
'mdps/nvt.mdp': 'NVT parameters',
'mdps/npt.mdp': 'NPT parameters',
'mdps/md.mdp': 'Production parameters'
}
missing = []
for file, description in required_files.items():
full_path = Path(output_dir) / file
if not full_path.exists():
missing.append(f"{file} ({description})")
if missing:
print("Missing files:")
for file in missing:
print(f" - {file}")
else:
print("All required files present")
return len(missing) == 0
# Check
check_output_files("output")
File Size Management¶
def get_directory_size(path):
"""Calculate total size of directory"""
total = 0
for dirpath, dirnames, filenames in os.walk(path):
for filename in filenames:
filepath = os.path.join(dirpath, filename)
total += os.path.getsize(filepath)
return total / (1024**3) # GB
# Check sizes
print(f"Total size: {get_directory_size('output'):.2f} GB")
print(f"Trajectory: {os.path.getsize('output/GMX_PROLIG_MD/prod/md.xtc')/(1024**3):.2f} GB")
Compress Output¶
# Compress trajectory
gmx trjconv -f md.xtc -o md_compressed.xtc -dt 10
# Archive results
tar czf simulation_results.tar.gz output/
# Selective archiving (exclude large files)
tar czf analysis_only.tar.gz \
--exclude='*.xtc' \
--exclude='*.trr' \
output/
Data Extraction¶
Extract Specific Frames¶
# Extract every 100th frame
gmx trjconv -f md.xtc -o sampled.xtc -skip 100
# Extract time range (100-200 ns)
gmx trjconv -f md.xtc -o segment.xtc -b 100000 -e 200000
# Extract final frame
gmx trjconv -f md.xtc -s md.tpr -o final.pdb -dump 500000
Convert Formats¶
# Convert to PDB (for visualization)
gmx trjconv -f md.gro -s md.tpr -o structure.pdb
# Convert trajectory to PDB (warning: large file)
gmx trjconv -f md.xtc -s md.tpr -o trajectory.pdb
# Convert to DCD (for VMD)
gmx trjconv -f md.xtc -o trajectory.dcd
Clean Up¶
Remove Temporary Files¶
def cleanup_temporary_files(output_dir):
"""Remove temporary and backup files"""
patterns = [
'#*#', # Emacs backups
'*~', # Backup files
'*.log.*', # Old log files
'step*.pdb' # Intermediate PDB files
]
from pathlib import Path
import glob
for pattern in patterns:
for file in Path(output_dir).rglob(pattern):
print(f"Removing: {file}")
file.unlink()
# Clean
cleanup_temporary_files("output")
Archive Project¶
import shutil
from datetime import datetime
def archive_project(output_dir):
"""Create project archive with timestamp"""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
archive_name = f"prism_project_{timestamp}"
# Create archive
shutil.make_archive(
archive_name,
'tar',
output_dir
)
print(f"Created archive: {archive_name}.tar")
return f"{archive_name}.tar"
# Archive
archive_project("output")
Best Practices¶
- Regular Backups: Save checkpoint files frequently
- Compression: Use XTC format for trajectories
- Documentation: Keep prism_config.yaml with results
- Version Control: Track MDP files and configurations
- Data Management: Remove unnecessary temporary files
- Metadata: Document file contents and parameters