GROMACS

GROMACS Overview

The Gromacs site describes itself as

GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.

—About GROMACS

This package is made available to all cluster users of both environments by leveraging an Apptainer container that contains both the GROMACS package as well as all needed libraries to run it. This allows us to easily update the package when requested and for older versions to exist if a researcher has a need for an older version.

Because of the way that the GROMACS container is deployed in our environment, we will leverage a small wrapper script that will help start the container and call the real GROMACS (gmx) binary as well as stage up your data in the appropriate directories.

Example Files

Download the Needed Files

Download all of the following files and place them in the same directory in the cluster environment. If you downloaded them to your local system instead of directly to the cluster, you can use SCP to copy them to the cluster. If you didn’t download and extract the gromacs_example.tar.xz file in the sidebar, you will need to ensure that gmx.sh and gmx-gpu.sh files are executable, which you can do by running chmod +x gmx*.sh on the cluster head node where you’ve put the files. The mdrun-gpu.htsub submission file will also create it’s log files in a logs/ directory. This again will be created for you if you download and extract the gromacs_example.tar.xz file. You can create your own logs directory by running mkdir logs on the cluster head node where you’ve put your files, or modify the mdrun-gpu.htsub and remove the logs/ prefix in the three lines where logging is configured.

Interacting with GROMACS via HTCondor

Many of the GROMACS commands expect the user to type input to answer a selection prompt. It’s possible to script this out and provide the input needed via HTCondor, but unless you know the exact number of your inputs ahead of time, this becomes difficult to get right. The easy workaround is to have HTCondor allocate resources on a compute node and provide you an interactive shell there so you can run commands as if you were sitting down at the compute node instead of the head node.

Getting an Interactive Prompt on a Compute Node

HTCondor’s condor_submit makes getting an interactive shell prompt easy. Simply add the -interactive flag when you run the command. Be sure to request some resources for yourself, since it’s default may be too low for your needs.

condor_submit -interactive request_cpus=4 request_memory=4096

The command will inform you that it’s submitted the interactive job to the queue and then waits to be run on a compute node. You’ll see the warning about limitations of interactive jobs, and finally be presented by a new prompt informing you of which compute node your utilizing.

USER@cse-head:~/gromacs_example$ condor_submit -interactive request_cpus=4 request_memory=4096
Submitting job(s).
1 job(s) submitted to cluster 1182.
Welcome to slot1_1@c-X-X.cluster.cs.wwu.edu!
You will be logged out after 3600 seconds of inactivity.


**********
 WARNING!
**********
This HTCondor session will end after 24 hours, or 1 hour of idle time.
To run for more than 24 hours, submit the job without "-interactive".


USER@c-X-X:~/gromacs_example$

Once you have a prompt at the compute node, you can run the gmx.sh script to access all of the GROMACS commands, including mdrun.

Tip

Be careful! The mdrun command can run for a long time and need a lot of resources to complete. Because of this, it has it’s own job submission file that you should use to ensure that it has the proper runtime and resources allocated to it. Unless you know your run will complete quickly, it is suggested you use the mdrun.htsub provided here and submit the mdrun job separately from your interactive session.

Example use case (after getting an interactive shell on a compute node):

USER@c-X-X:~/gromacs_example$ ./gmx.sh pdb2gmx -f 1ubq.pdb -o 1ubq.gro -p 1ubq.top -ignh
                   :-) GROMACS - gmx pdb2gmx, 2022 (-:

Executable:   /gromacs/bin.AVX_256/gmx_mpi
Data prefix:  /gromacs
Working dir:  /cluster/home/USER/gromacs_example
Command line:
  gmx_mpi pdb2gmx -f 1ubq.pdb -o 1ubq.gro -p 1ubq.top -ignh

Select the Force Field:

From '/gromacs/share/gromacs/top':

 1: AMBER03 protein, nucleic AMBER94 (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)

...

You’ll notice that the printed “Command line” is calling a different program. This is because the gmx.sh is setting up the Apptainer environment for GROMACS and running the command inside of it. Don’t worry about this. The script inside the Apptainer environment will also determine the fastest available executable to run for your compute node, so you may see a different response there as well.

Note

There is a leading ./ on the gmx.sh to indicate that the command you want to run should be the file gmx.sh in your current working directory. gmx.sh is not in your $PATH so typing gmx.sh without the ./ will print gmx.sh: command not found.

There will often be multiple interactive runs of gmx.sh with various commands to prepare your data for the mdrun command. Once you’ve prepared all of your files for the mdrun command, you can submit the mdrun job to be run for it’s larger, longer calculations.

Submitting the mdrun job

While possible to run the mdrun command in an interactive session, it may run for longer than the 24 hours allowed, or need additional resources such as a GPU to get it’s results in a timely manner. Because of this, we submit a separate job to the queue to request a node that has a GPU we can use.

Note

You will need to modify the mdrun-gpu.htsub file to adjust the filename(s) and any options needed to be passed to the mdrun command before submitting the job. You may also want to adjust the amount of resources being requested to match your needs. See the explanations of each line in the file in mdrun-gpu.htsub Explained below for information about which lines do what.

Tip

Editing files in a terminal can be daunting, but nano(1) makes it easy! The bottom two lines print what key commands are available. The ^ symbol represents the Ctrl key. So Ctrl + O will save a file, and Ctrl + X will exit.

After modifying the mdrun.htsub for your needs, you can submit it using the condor_submit command from the head node. You will not be able to submit the job from a currently running interactive session.

condor_submit mdrun-gpu.htsub

Tip

You can add the option -batch-name with the argument “$(basename $(pwd))” to set your current directory’s name as your job’s batch name. This is really useful when you want to see at a glance the status of multiple jobs. It’s not required, but is nice to have!

condor_submit -batch-name "$(basename $(pwd))" mdrun-gpu.htsub

Once the mdrun job has been submitted, you can check it’s status in the queue with condor_q or condor_watch_q. Check the FAQ for explanations. After the job is in a running state, the output of the main node that is running the job will be in two files, logs/mdrun.filename.err and logs/mdrun.filename.out. The filename is whatever filename you specified in the submission file, and the .err and .out indicate if the output came from STDERR or STDOUT. GROMACS tends to put most of it’s messages on STDERR, so you can follow along as your job runs by using the tail command’s -f option.

tail -f logs/mdrun.filename.err

This will continuously watch the file for new data to be written to it and display it on the screen. When you want to stop watching the file, press Ctrl+C to exit.

Example Files Explained

What follows is an explanation of each of the supplied files, and how they play a role in running GROMACS in our environment.

gmx.sh Explained

Think of the gmx.sh file as a replacement for anywhere you would have run the gmx command. Any flags or commands you would have given to it, you should pass to this script and it will pass them on to the real command. You will most likely not need to modify this file for your work.

GROMACS wrapper script to startup Apptainer container and run the gmx command – gmx.sh

#!/bin/sh

exec apptainer exec -B "$(pwd)":/data /cluster/share/singularity/gromacs-2022.2.sif gmx ${@}

That second line has a lot to unpack here, but the first line is simple. It tells the system that we’re a shell script and should be run with the sh shell. If you’re curious about the internals here, keep reading. Otherwise feel free to skip onto the next file.

(Optional) Technical breakdown of the second line

exec

Tell’s the shell to replace itself with the following command
apptainer

The real command we want to run, apptainer. The full path to the binary is prefixed here in case the user doesn’t have this in their path because they’ve overridden it. Everything that follows this is “sub-commands” to the Apptainer command.
- exec
  
  Tells Apptainer to execute the command it’s given on the command line, not the command it may have been pre-configured to run.
  - -B “$(pwd)”:/data
    
    Tells Apptainer to “bind” the current directory to /data inside the container, which is where GROMACS intends to operate on its data inside the contained environment.
  - …/gromacs-2022.0.sif
    
    Tells Apptainer the path the GROMACS Apptainer Image File (sif) that holds the container we want to run. It is made available on all nodes so you don’t have to worry about copying it over in your submission file.
  - gmx
    
    Tells Apptainer which command want to run inside the container.
  - ${@}
    
    Finally any arguments that were given to the wrapper script are passed to the above gmx command. You may be tempted to modify this to include your arguments, but you will break the wrapper for everything else.

gmx-gpu.sh Explained

The one difference here is the added option –nvccli. This tells Apptainer to invoke the Nvidia container command line tools to help it learn about the system’s libraries for using the GPU. Without this, the GPU can not be used inside the container.

Every other option is the same as the normal gmx.sh script.

mdrun-gpu.htsub Explained

From the mdrun manual page:

gmx mdrun is the main computational chemistry engine within GROMACS. Obviously, it performs Molecular Dynamics simulations, but it can also perform Stochastic Dynamics, Energy Minimization, test particle insertion or (re)calculation of energies. Normal mode analysis is another option. In this case mdrun builds a Hessian matrix from single conformation. For usual Normal Modes-like calculations, make sure that the structure provided is properly energy-minimized. The generated matrix can be diagonalized by gmx nmeig.

—gmx mdrun manual page

Because the mdrun command tends to need more resources and cores than the other shorter running commands, it needs it’s own submission file to leverage HTCondor’s ability to request a GPU node for acceleration.

HTCondor submission script to run the mdrun command of gmx– mdrun-gpu.htsub

# Which file to process
filename        = production

# Maybe get a CSCI GPU?
#+WantFlocking   = True

# Maybe get a CSE GPU?
#+WantPreempt    = True

# Basic job attributes
Executable      = gmx-gpu.sh
Arguments       = mdrun -v -s $(filename)_nvt.tpr -deffnm $(filename)_nvt

Output          = logs/mdrun.$(filename).o
Error           = logs/mdrun.$(filename).e
Log             = logs/mdrun.$(filename).log

# How much memory each process will need --
# please adjust this to reflect your needs.
request_memory  = 32GB

# 8 Cores is the cap for one of the static slots on CSCI
request_cpus    = 8

# GPU please!
request_gpus    = 1

queue

Lines 1-2:

This defines a variable (only used by the HTCondor submission script) of what the name of the file to use is, instead of writing it multiple times on the arguments line. It also gets reused to name each log file with the associated filename.

Lines 4-8:

When submitting from the CSE cluster, there are no generally available GPUs to be used, but there are CSE faculty who graciously share their resources with the cluster, as well as the CSCI department who shares their resources as well.

+WantFlocking sends an extra variable to the scheduler which says you’re OK leaving the CSE pool and going somewhere else to run.
+WantPreempt says you’re OK if you job gets evicted early. This is usually fine, though may not work for all use cases. It can be disabled if desired, and defaults to off, which is why we override it to True here.

Lines 10-12:

Our executable we want to run the wrapper script for setting up the GROMACS container. The gmx-gpu.sh that would be gmx command becomes the executable, followed by its usual mdrun argument and associated flags. The previously defined filename variable is used here multiple times to allow for this submission file to be quickly reused in the future. Line 12 is the line where you will adjust your arguments to the mdrun command to specify any needed parameters.

Lines 14-16:: Logs are written to the logs/ directory to reduce clutter in our directory, and the files are aptly named after the job, with the output, error, and HTCondor log being saved as separate files. Most of the output of the mdrun process will be in the logs/mdrun.$(filename).e file.
Lines 18-20:: Takes a guess at how much memory you will need. You can get a better estimate after completing a job and looking at the logs/mdrun.$(filename).log file for your next submission. Please be considerate of others and only request what you need.
Lines 22-23:: Asks for the number of cpus for GROMACS to use. GROMACS will automatically leverage OpenMP because HTCondor will set the OMP_NUM_THREADS environment variable for you.
Lines 25-26:: Ask for a GPU. You should only ask for 1. Asking for multiple will not make GROMACS faster at this time.
Line 28:: Finally we enqueue one instance of the job we just defined above.
Additional notes:: The request for 8 cpus was to make it fit nicely with some of the static slots used by some of the CSCI GPU nodes. Requesting more will not allow your job to run on those nodes, so it’s a good place to start. If you need access to additional CPUs you can request them, but you may be stuck waiting in the queue for quite a while.

Currently Available Versions

There is currently only one version available, 2022.0. If updated or additional versions are needed please reach out to .