Skip to content

Output Files Guide

Understanding PRISM's output structure is essential for analysis and troubleshooting. This guide details all files generated during system building and simulation.

Quick Start

# After building, find your files here:
ls my_system/GMX_PROLIG_MD/     # Main simulation directory
ls my_system/LIG.amb2gmx/       # Ligand force field files
ls my_system/mdps/              # MDP parameter files

Output Directory Structure

output_directory/
├── GMX_PROLIG_MD/          # Main simulation directory
│   ├── solv_ions.gro       # Solvated system structure
│   ├── topol.top           # System topology
│   ├── index.ndx           # Index groups
│   ├── em/                 # Energy minimization
│   ├── nvt/                # NVT equilibration
│   ├── npt/                # NPT equilibration
│   └── prod/               # Production MD
├── LIG.amb2gmx/            # Ligand force field (GAFF)
│   ├── LIG.gro             # Ligand structure
│   ├── LIG.itp             # Ligand topology
│   ├── LIG.top             # Full topology
│   ├── atomtypes_LIG.itp   # Atom type definitions
│   └── posre_LIG.itp       # Position restraints
├── mdps/                   # MDP parameter files
│   ├── em.mdp              # Energy minimization
│   ├── nvt.mdp             # NVT equilibration
│   ├── npt.mdp             # NPT equilibration
│   └── md.mdp              # Production MD
├── forcefield/             # Temporary force field files
├── protein_clean.pdb       # Cleaned protein structure
└── prism_config.yaml       # Configuration used

Core System Files

solv_ions.gro

The complete solvated system with ions:

Protein-Ligand Complex in Water
47593
    1MET      N    1   2.345   1.234   0.567
    1MET     CA    2   2.456   1.345   0.678
    ...
47591SOL    HW2 47593  10.234   9.876   8.543
  10.12345  10.12345  10.12345

Format: GROMACS coordinate file - Line 1: Title - Line 2: Number of atoms - Lines 3-n: Atom information - Last line: Box vectors

Usage:

import mdtraj as md

# Load structure
structure = md.load("GMX_PROLIG_MD/solv_ions.gro")
print(f"Atoms: {structure.n_atoms}")
print(f"Box: {structure.unitcell_vectors}")

topol.top

Master topology file:

; System topology for GROMACS
; Generated by PRISM

#include "amber99sb.ff/forcefield.itp"

; Ligand parameters
#include "../LIG.amb2gmx/atomtypes_LIG.itp"
#include "../LIG.amb2gmx/LIG.itp"

; Water and ions
#include "amber99sb.ff/tip3p.itp"
#include "amber99sb.ff/ions.itp"

[ system ]
Protein-Ligand Complex

[ molecules ]
Protein_chain_A    1
LIG                1
SOL             14523
NA                 42
CL                 38

Components: - Force field includes - Ligand parameters - Molecule list with counts

Ligand Force Field Files

LIG.itp (GAFF)

Ligand topology with all parameters:

[ moleculetype ]
; Name            nrexcl
LIG                3

[ atoms ]
;   nr       type  resnr residue  atom   cgnr     charge       mass
     1         c3      1    LIG     C1      1   -0.094100    12.010
     2         c3      1    LIG     C2      2    0.142900    12.010

[ bonds ]
;  ai    aj funct    c0    c1
    1     2     1  0.15375  259408.0

[ angles ]
;  ai    aj    ak funct    c0    c1
    1     2     3     1  109.50  418.40

[ dihedrals ]
;  ai    aj    ak    al funct    c0    c1    c2    c3    c4    c5
    1     2     3     4     9  0.0  0.65084  3

#ifdef POSRES
#include "posre_LIG.itp"
#endif

LIG.gro

Ligand coordinates:

LIG
   42
    1LIG     C1    1   0.123   0.456   0.789
    1LIG     C2    2   0.234   0.567   0.890
    ...
   1.50000   1.50000   1.50000

atomtypes_LIG.itp

Atom type definitions:

[ atomtypes ]
; name    at.num    mass    charge   ptype   sigma   epsilon
c3          6      12.010    0.000     A    0.33996   0.45773
ca          6      12.010    0.000     A    0.33996   0.35982

MDP Parameter Files

em.mdp (Energy Minimization)

; Energy Minimization Parameters
integrator  = steep         ; Steepest descent
emtol       = 200.0        ; kJ/mol/nm
emstep      = 0.01         ; Step size
nsteps      = 10000        ; Maximum steps

; Output
nstxout     = 100          ; Coordinates
nstenergy   = 100          ; Energies
nstlog      = 100          ; Log file

nvt.mdp (NVT Equilibration)

; NVT Equilibration
integrator  = md           ; Molecular dynamics
dt          = 0.002        ; 2 fs
nsteps      = 250000       ; 500 ps

; Temperature coupling
tcoupl      = V-rescale    ; Thermostat
tc-grps     = Protein Non-Protein
tau_t       = 0.1    0.1   ; Coupling time
ref_t       = 310    310   ; Target temperature

; Constraints
constraints = h-bonds      ; Hydrogen bonds
constraint_algorithm = lincs

npt.mdp (NPT Equilibration)

; NPT Equilibration
pcoupl      = C-rescale    ; Barostat
pcoupltype  = isotropic    ; Coupling type
tau_p       = 1.0          ; Coupling time
ref_p       = 1.0          ; Target pressure
compressibility = 4.5e-5   ; Water compressibility

; Continue from NVT
continuation = yes         ; Continue from NVT
gen_vel     = no          ; Don't generate velocities

md.mdp (Production)

; Production MD
nsteps      = 250000000    ; 500 ns
dt          = 0.002        ; 2 fs

; Output control
nstxout-compressed = 250000  ; 500 ps
compressed-x-grps  = System
nstenergy         = 5000     ; 10 ps
nstlog            = 5000     ; 10 ps

Simulation Output Files

Energy Minimization

em/
├── em.tpr          # Binary run input file
├── em.gro          # Minimized structure
├── em.edr          # Energy file
├── em.log          # Log file
└── em.trr          # Full trajectory (optional)

Check minimization:

# View final energy
grep "Potential Energy" em/em.log | tail -1

# Extract energy profile
gmx energy -f em/em.edr -o em_energy.xvg

Equilibration Files

nvt/
├── nvt.tpr         # Run input
├── nvt.gro         # Final structure
├── nvt.edr         # Energies
├── nvt.log         # Log file
├── nvt.cpt         # Checkpoint
└── nvt.trr         # Trajectory (optional)

npt/
├── npt.tpr
├── npt.gro
├── npt.edr
├── npt.log
├── npt.cpt
└── npt.trr

Production Files

prod/
├── md.tpr          # Run input
├── md.xtc          # Compressed trajectory
├── md.gro          # Final structure
├── md.edr          # Energy data
├── md.log          # Simulation log
├── md.cpt          # Checkpoint file
└── md_pullx.xvg    # Pull data (if applicable)

File Formats Explained

TPR Files

Binary run input files containing: - Complete system topology - Initial coordinates - Simulation parameters - Velocities (if present)

Usage:

# View TPR contents
gmx dump -s md.tpr | less

# Extract structure
gmx editconf -f md.tpr -o structure.pdb

XTC/TRR Trajectories

XTC: Compressed trajectory (positions only) - Smaller file size - Lossy compression - Standard for analysis

TRR: Full trajectory (positions, velocities, forces) - Larger files - Complete information - Needed for some analyses

# Load trajectories
import mdtraj as md

# XTC (compressed)
traj = md.load("prod/md.xtc", top="prod/md.tpr")

# TRR (full precision)
traj_full = md.load("nvt/nvt.trr", top="nvt/nvt.tpr")

EDR Energy Files

Binary files containing energies and system properties:

# Extract properties
gmx energy -f md.edr << EOF
Potential
Kinetic-En.
Total-Energy
Temperature
Pressure
Box-X
Box-Y
Box-Z
EOF

# Convert to text
gmx energy -f md.edr -o energy.xvg

CPT Checkpoint Files

Binary checkpoint for restarting simulations:

# View checkpoint info
gmx check -f md.cpt

# Restart from checkpoint
gmx mdrun -s md.tpr -cpi md.cpt -deffnm md -append

Log Files

Simulation Logs

Example log file structure:

Log file opened on Mon Sep  1 10:00:00 2024

GROMACS version:    2024.1
Precision:          mixed
Memory model:       64 bit
MPI library:        thread_mpi

Running on 1 node with total 12 cores, 24 logical cores, 1 compatible GPU

Using 1 MPI thread and 10 OpenMP threads

    Step           Time
       0        0.00000

   Energies (kJ/mol)
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    1.23456e+03    2.34567e+03    3.45678e+02    4.56789e+03    5.67890e+04

Performance: 45.2 ns/day

Parse performance:

def parse_performance(log_file):
    with open(log_file) as f:
        for line in f:
            if "Performance:" in line:
                ns_per_day = float(line.split()[1])
                return ns_per_day
    return None

perf = parse_performance("prod/md.log")
print(f"Speed: {perf} ns/day")

Configuration Files

prism_config.yaml

Saved configuration for reproducibility:

general:
  overwrite: false

box:
  distance: 1.5
  shape: cubic
  center: true

simulation:
  temperature: 310
  pressure: 1.0
  pH: 7.0
  ligand_charge: 0
  production_time_ns: 500

forcefield:
  protein: amber14sb
  ligand: gaff
  water: tip3p

Analysis Output Files

contact_analysis.html

Interactive visualization with: - 2D/3D molecular viewer - Contact frequency data - Export capabilities

Analysis Data Files

analysis/
├── rmsd.xvg        # RMSD over time
├── rmsf.xvg        # RMSF per residue
├── energy.xvg      # Energy components
├── contacts.csv    # Contact analysis
└── hbonds.dat      # Hydrogen bonds

Working with Output Files

Check File Integrity

import os
from pathlib import Path

def check_output_files(output_dir):
    """Check if all expected files exist"""

    required_files = {
        'GMX_PROLIG_MD/solv_ions.gro': 'Solvated system',
        'GMX_PROLIG_MD/topol.top': 'Topology',
        'mdps/em.mdp': 'EM parameters',
        'mdps/nvt.mdp': 'NVT parameters',
        'mdps/npt.mdp': 'NPT parameters',
        'mdps/md.mdp': 'Production parameters'
    }

    missing = []
    for file, description in required_files.items():
        full_path = Path(output_dir) / file
        if not full_path.exists():
            missing.append(f"{file} ({description})")

    if missing:
        print("Missing files:")
        for file in missing:
            print(f"  - {file}")
    else:
        print("All required files present")

    return len(missing) == 0

# Check
check_output_files("output")

File Size Management

def get_directory_size(path):
    """Calculate total size of directory"""
    total = 0
    for dirpath, dirnames, filenames in os.walk(path):
        for filename in filenames:
            filepath = os.path.join(dirpath, filename)
            total += os.path.getsize(filepath)
    return total / (1024**3)  # GB

# Check sizes
print(f"Total size: {get_directory_size('output'):.2f} GB")
print(f"Trajectory: {os.path.getsize('output/GMX_PROLIG_MD/prod/md.xtc')/(1024**3):.2f} GB")

Compress Output

# Compress trajectory
gmx trjconv -f md.xtc -o md_compressed.xtc -dt 10

# Archive results
tar czf simulation_results.tar.gz output/

# Selective archiving (exclude large files)
tar czf analysis_only.tar.gz \
  --exclude='*.xtc' \
  --exclude='*.trr' \
  output/

Data Extraction

Extract Specific Frames

# Extract every 100th frame
gmx trjconv -f md.xtc -o sampled.xtc -skip 100

# Extract time range (100-200 ns)
gmx trjconv -f md.xtc -o segment.xtc -b 100000 -e 200000

# Extract final frame
gmx trjconv -f md.xtc -s md.tpr -o final.pdb -dump 500000

Convert Formats

# Convert to PDB (for visualization)
gmx trjconv -f md.gro -s md.tpr -o structure.pdb

# Convert trajectory to PDB (warning: large file)
gmx trjconv -f md.xtc -s md.tpr -o trajectory.pdb

# Convert to DCD (for VMD)
gmx trjconv -f md.xtc -o trajectory.dcd

Clean Up

Remove Temporary Files

def cleanup_temporary_files(output_dir):
    """Remove temporary and backup files"""

    patterns = [
        '#*#',      # Emacs backups
        '*~',       # Backup files
        '*.log.*',  # Old log files
        'step*.pdb' # Intermediate PDB files
    ]

    from pathlib import Path
    import glob

    for pattern in patterns:
        for file in Path(output_dir).rglob(pattern):
            print(f"Removing: {file}")
            file.unlink()

# Clean
cleanup_temporary_files("output")

Archive Project

import shutil
from datetime import datetime

def archive_project(output_dir):
    """Create project archive with timestamp"""

    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    archive_name = f"prism_project_{timestamp}"

    # Create archive
    shutil.make_archive(
        archive_name,
        'tar',
        output_dir
    )

    print(f"Created archive: {archive_name}.tar")
    return f"{archive_name}.tar"

# Archive
archive_project("output")

Best Practices

  1. Regular Backups: Save checkpoint files frequently
  2. Compression: Use XTC format for trajectories
  3. Documentation: Keep prism_config.yaml with results
  4. Version Control: Track MDP files and configurations
  5. Data Management: Remove unnecessary temporary files
  6. Metadata: Document file contents and parameters