GROMACS
Overview
The Gromacs site describes itself as
GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.
This package is made available to all cluster users of both environments by leveraging an Apptainer container that contains both the GROMACS package as well as all needed libraries to run it. This allows us to easily update the package when requested and for older versions to exist if a researcher has a need for an older version.
Because of the way that the GROMACS container is deployed in our environment, we will leverage a small wrapper script that will help start the container and call the real GROMACS (gmx) binary as well as stage up your data in the appropriate directories.
Example Files
Download the Needed Files
Download all of the following files and place them in the same
directory in the cluster environment. If you downloaded them to your
local system instead of directly to the cluster, you can use SCP to copy them to the cluster. If you
didn’t download and extract the gromacs_example.tar.xz
file in the
sidebar, you will need to ensure that gmx.sh
and gmx-gpu.sh
files are executable, which you can do by running chmod +x gmx*.sh
on the cluster head node where you’ve put the files. The
mdrun-gpu.htsub
submission file will also create it’s log files in
a logs/
directory. This again will be created for you if you
download and extract the gromacs_example.tar.xz
file. You can
create your own logs directory by running mkdir logs
on the
cluster head node where you’ve put your files, or modify the
mdrun-gpu.htsub
and remove the logs/ prefix in the three lines
where logging is configured.
Interacting with GROMACS via HTCondor
Many of the GROMACS commands expect the user to type input to answer a selection prompt. It’s possible to script this out and provide the input needed via HTCondor, but unless you know the exact number of your inputs ahead of time, this becomes difficult to get right. The easy workaround is to have HTCondor allocate resources on a compute node and provide you an interactive shell there so you can run commands as if you were sitting down at the compute node instead of the head node.
Getting an Interactive Prompt on a Compute Node
HTCondor’s condor_submit makes getting an interactive shell
prompt easy. Simply add the -interactive
flag when you run the
command. Be sure to request some resources for yourself, since it’s
default may be too low for your needs.
condor_submit -interactive request_cpus=4 request_memory=4GB
The command will inform you that it’s submitted the interactive job to the queue and then waits to be run on a compute node. You’ll see the warning about limitations of interactive jobs, and finally be presented by a new prompt informing you of which compute node your utilizing.
USER@cse-head:~/gromacs_example$ condor_submit -interactive request_cpus=4 request_memory=4GB
Submitting job(s).
1 job(s) submitted to cluster 1182.
Welcome to slot1_1@c-X-X.cluster.cs.wwu.edu!
You will be logged out after 3600 seconds of inactivity.
**********
WARNING!
**********
This HTCondor session will end after 24 hours, or 1 hour of idle time.
To run for more than 24 hours, submit the job without "-interactive".
USER@c-X-X:~/gromacs_example$
Once you have a prompt at the compute node, you can run the gmx.sh script to access all of the GROMACS commands, including mdrun.
Tip
Be careful! The mdrun command can run for a long time and need a lot of resources to complete. Because of this, it has it’s own job submission file that you should use to ensure that it has the proper runtime and resources allocated to it. Unless you know your run will complete quickly, it is suggested you use the mdrun.htsub provided here and submit the mdrun job separately from your interactive session.
Example use case (after getting an interactive shell on a compute node):
USER@c-X-X:~/gromacs_example$ ./gmx.sh pdb2gmx -f 1ubq.pdb -o 1ubq.gro -p 1ubq.top -ignh
:-) GROMACS - gmx pdb2gmx, 2022 (-:
Executable: /gromacs/bin.AVX_256/gmx_mpi
Data prefix: /gromacs
Working dir: /cluster/home/USER/gromacs_example
Command line:
gmx_mpi pdb2gmx -f 1ubq.pdb -o 1ubq.gro -p 1ubq.top -ignh
Select the Force Field:
From '/gromacs/share/gromacs/top':
1: AMBER03 protein, nucleic AMBER94 (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)
...
You’ll notice that the printed “Command line” is calling a different program. This is because the gmx.sh is setting up the Apptainer environment for GROMACS and running the command inside of it. Don’t worry about this. The script inside the Apptainer environment will also determine the fastest available executable to run for your compute node, so you may see a different response there as well.
Note
There is a leading ./
on the gmx.sh to indicate that
the command you want to run should be the file gmx.sh
in your
current working directory. gmx.sh
is not in your
$PATH
so typing gmx.sh
without the ./
will print
gmx.sh: command not found
.
There will often be multiple interactive runs of gmx.sh with various commands to prepare your data for the mdrun command. Once you’ve prepared all of your files for the mdrun command, you can submit the mdrun job to be run for it’s larger, longer calculations.
Submitting the mdrun job
While possible to run the mdrun command in an interactive session, it may run for longer than the 24 hours allowed, or need additional resources such as a GPU to get it’s results in a timely manner. Because of this, we submit a separate job to the queue to request a node that has a GPU we can use.
Note
You will need to modify the mdrun-gpu.htsub file to adjust the filename(s) and any options needed to be passed to the mdrun command before submitting the job. You may also want to adjust the amount of resources being requested to match your needs. See the explanations of each line in the file in mdrun-gpu.htsub below for information about which lines do what.
Tip
Editing files in a terminal can be daunting, but
nano(1) makes it easy! The bottom two lines print what
key commands are available. The ^
symbol represents the
Ctrl key. So Ctrl + O will save a file, and
Ctrl + X will exit.
After modifying the mdrun.htsub for your needs, you can submit it using the condor_submit command from the head node. You will not be able to submit the job from a currently running interactive session.
condor_submit mdrun-gpu.htsub
Tip
You can add the option -batch-name
with the argument
“$(basename $(pwd))” to set your current directory’s name as your
job’s batch name. This is really useful when you want to see at a
glance the status of multiple jobs. It’s not required, but is nice
to have!
condor_submit -batch-name "$(basename $(pwd))" mdrun-gpu.htsub
Once the mdrun job has been submitted, you can check it’s
status in the queue with condor_q
or condor_watch_q
. Check the
FAQ for explanations. After the job is in a
running state, the output of the main node that is running the job
will be in two files, logs/mdrun.filename.err
and
logs/mdrun.filename.out
. The filename is whatever filename you
specified in the submission file, and the .err and .out indicate
if the output came from STDERR or STDOUT. GROMACS tends to put most of
it’s messages on STDERR, so you can follow along as your job runs by
using the tail command’s -f
option.
tail -f logs/mdrun.filename.err
This will continuously watch the file for new data to be written to it and display it on the screen. When you want to stop watching the file, press Ctrl+C to exit.
Example Files Explained
What follows is an explanation of each of the supplied files, and how they play a role in running GROMACS in our environment.
gmx.sh
Think of the gmx.sh
file as a replacement for anywhere you would
have run the gmx command. Any flags or commands you would
have given to it, you should pass to this script and it will pass them
on to the real command. You will most likely not need to modify this
file for your work.
1#!/bin/sh
2
3exec apptainer exec -B "$(pwd)":/data /cluster/share/singularity/gromacs-2022.2.sif gmx ${@}
That second line has a lot to unpack here, but the first line is simple. It tells the system that we’re a shell script and should be run with the sh shell. If you’re curious about the internals here, keep reading. Otherwise feel free to skip onto the next file.
(Optional) Technical breakdown of the second line
-
exec
Tell’s the shell to replace itself with the following command
-
apptainer
The real command we want to run, apptainer. The full path to the binary is prefixed here in case the user doesn’t have this in their path because they’ve overridden it. Everything that follows this is “sub-commands” to the Apptainer command.
-
exec
Tells Apptainer to execute the command it’s given on the command line, not the command it may have been pre-configured to run.
-
-B “$(pwd)”:/data
Tells Apptainer to “bind” the current directory to /data inside the container, which is where GROMACS intends to operate on its data inside the contained environment.
-
…/gromacs-2022.0.sif
Tells Apptainer the path the GROMACS Apptainer Image File (sif) that holds the container we want to run. It is made available on all nodes so you don’t have to worry about copying it over in your submission file.
-
gmx
Tells Apptainer which command want to run inside the container.
-
${@}
Finally any arguments that were given to the wrapper script are passed to the above gmx command. You may be tempted to modify this to include your arguments, but you will break the wrapper for everything else.
-
-
gmx-gpu.sh
The one difference here is the added option –nvccli. This tells Apptainer to invoke the Nvidia container command line tools to help it learn about the system’s libraries for using the GPU. Without this, the GPU can not be used inside the container.
Every other option is the same as the normal gmx.sh
script.
mdrun-gpu.htsub
From the mdrun manual page:
gmx mdrun
is the main computational chemistry engine within GROMACS. Obviously, it performs Molecular Dynamics simulations, but it can also perform Stochastic Dynamics, Energy Minimization, test particle insertion or (re)calculation of energies. Normal mode analysis is another option. In this casemdrun
builds a Hessian matrix from single conformation. For usual Normal Modes-like calculations, make sure that the structure provided is properly energy-minimized. The generated matrix can be diagonalized by gmx nmeig.
Because the mdrun command tends to need more resources and cores than the other shorter running commands, it needs it’s own submission file to leverage HTCondor’s ability to request a GPU node for acceleration.
1# Which file to process
2filename = production
3
4# Maybe get a CSCI GPU?
5#+WantFlocking = True
6
7# Maybe get a CSE GPU?
8#+WantPreempt = True
9
10# Basic job attributes
11Executable = gmx-gpu.sh
12Arguments = mdrun -v -s $(filename)_nvt.tpr -deffnm $(filename)_nvt
13
14Output = logs/mdrun.$(filename).o
15Error = logs/mdrun.$(filename).e
16Log = logs/mdrun.$(filename).log
17
18# How much memory each process will need --
19# please adjust this to reflect your needs.
20request_memory = 32GB
21
22# 8 Cores is the cap for one of the static slots on CSCI
23request_cpus = 8
24
25# GPU please!
26request_gpus = 1
27
28queue
- Lines 1-2:
-
This defines a variable (only used by the HTCondor submission script) of what the name of the file to use is, instead of writing it multiple times on the arguments line. It also gets reused to name each log file with the associated filename.
- Lines 4-8:
-
When submitting from the CSE cluster, there are no generally available GPUs to be used, but there are CSE faculty who graciously share their resources with the cluster, as well as the CSCI department who shares their resources as well.
+WantFlocking sends an extra variable to the scheduler which says you’re OK leaving the CSE pool and going somewhere else to run.
+WantPreempt says you’re OK if you job gets evicted early. This is usually fine, though may not work for all use cases. It can be disabled if desired, and defaults to off, which is why we override it to True here.
Lines 10-12:
Our executable we want to run the wrapper script for setting up the GROMACS container. The gmx-gpu.sh that would be gmx command becomes the executable, followed by its usual mdrun argument and associated flags. The previously defined filename variable is used here multiple times to allow for this submission file to be quickly reused in the future. Line 12 is the line where you will adjust your arguments to the mdrun command to specify any needed parameters.
- Lines 14-16:
-
Logs are written to the
logs/
directory to reduce clutter in our directory, and the files are aptly named after the job, with the output, error, and HTCondor log being saved as separate files. Most of the output of the mdrun process will be in thelogs/mdrun.$(filename).e
file. - Lines 18-20:
-
Takes a guess at how much memory you will need. You can get a better estimate after completing a job and looking at the
logs/mdrun.$(filename).log
file for your next submission. Please be considerate of others and only request what you need. - Lines 22-23:
-
Asks for the number of cpus for GROMACS to use. GROMACS will automatically leverage OpenMP because HTCondor will set the OMP_NUM_THREADS environment variable for you.
- Lines 25-26:
-
Ask for a GPU. You should only ask for 1. Asking for multiple will not make GROMACS faster at this time.
- Line 28:
-
Finally we enqueue one instance of the job we just defined above.
- Additional notes:
-
The request for 8 cpus was to make it fit nicely with some of the static slots used by some of the CSCI GPU nodes. Requesting more will not allow your job to run on those nodes, so it’s a good place to start. If you need access to additional CPUs you can request them, but you may be stuck waiting in the queue for quite a while.
Currently Available Versions
There is currently only one version available, 2022.0. If updated or additional versions are needed please reach out to .