Python
Python comes in many flavors, and the way you submit the jobs that run Python will vary from use case to use case. Here are three examples of various uses:
Basic Python
Let’s look at a simple python script “my_script.py” that only uses libraries from the standard library:
1#!/usr/bin/env python3
2
3import getpass, os, sys
4
5print('Hello, ', getpass.getuser(), '!', sep='')
6print("You're running on the host", os.uname()[1])
7print("The process arguments are:")
8for arg in range(len(sys.argv)):
9 print('argv[{}]:'.format(arg), sys.argv[arg])
If you just need to run a simple Python script such as the one above that only uses things from Python’s standard library, you can use a submission file like this:
1Universe = vanilla
2Executable = my_script.py
3Arguments = -arg1 -arg2 -arg3
4
5Output = out.log
6Error = err.log
7Log = condor.log
8
9Request_Cpus = 1
10Request_Memory = 2GB
11
12Queue
Tip
Don’t forget to set the execute bit on your Python script’s permissions!
You can do this with the chmod(1) command:
USER@cluster-head:~$ chmod +x my_script.py
If you don’t set the execute bit, HTCondor will put your job in a hold state with the following error:
Error from slotX_X@c-X-X.cluster.cs.wwu.edu: Failed to execute ‘my_script.py’: (errno=8: ‘Exec format error’)
And finally a sample session with output:
USER@cluster-head:~/example_submit_scripts/python_basic$ condor_submit basic_python.job
Submitting job(s).
1 job(s) submitted to cluster 59.
USER@cluster-head:~/example_submit_scripts/python_basic$ cat out.log
Hello, USER!
You're running on the host c-X-X.cluster.cs.wwu.edu
The process arguments are:
argv[0]: /cluster/home/USER/example_submit_scripts/python_basic/my_script.py
argv[1]: -arg1
argv[2]: -arg2
argv[3]: -arg3
Virtual Environments
A common technique with Python is to create a virtual environment (also known as venv or virtualenv) for each project, since each project might need different, incompatible versions of libraries. Python has made this very easy to do with the builtin venv module
You can quickly create and load a new virtual environment with the venv module:
1USER@cluster-head:~$ python3 -m venv my_venv
2USER@cluster-head:~$ source my_venv/bin/activate
3(my_venv) USER@cluster-head:~$
Here line 1 creates the virtual environment, line 2 activates it, and line 3 shows that it’s active by adjusting the prompt to display the name. To deactivate a previously activated environment you can use the deactivate command provided by the activate script.
Tip
When you activate your virtual environment for the first time it’s a good idea to update the core packages used by pip for package management: pip, setuptools, and wheel.
(my_venv) USER@cluster-head:~$ python3 -m pip install --upgrade pip setuptools wheel
If you don’t update these tools to a compatible version, pip may fail to install packages later that are expecting them to be up to date.
For the purposes of this example I’m going to install a simple library that’s not available systemwide, just to prove the virtual environment has been activated and being used. To play along with the example Python script, activate your virtual environment and install the figcow package:
(my_venv) USER@cluster-head:~/example_submit_scripts/python_venv$ python3 -m pip install figcow
Collecting figcow
Downloading figcow-1.0.2-py3-none-any.whl (5.5 kB)
Installing collected packages: figcow
Successfully installed figcow-1.0.2
Now we need a simple script to use our newly downloaded package:
1#!/usr/bin/env python3
2
3import figcow, getpass, os, sys
4
5print('Hello, ', getpass.getuser(), '!', sep='')
6print("You're running on the host", os.uname()[1])
7
8print(figcow.cow(' '.join((sys.argv[1:]))))
The problem with virtual environments is that you need to “load” the environment in order to use it. The activate script provided by venv can automatically setup the correct environment variables, but HTCondor only lets us specify one Executable in the submission file. There’s two solutions to this problem:
Use a small wrapper script to activate the environment, and then invoke Python.
Manually setup our environment variables from the submission file.
Both options are presented below.
Using a Wrapper Script
We’ll need to write a small shell script that will activate our virtual environment, then execute Python with your script any additional arguments you may need to pass to it.
1#!/bin/sh
2
3. my_venv/bin/activate
4exec python "$@"
Here line 3 will source the activate script (The syntax is .
in
sh. If you want to use bash, you can instead use the
source
keyword.), then line 4 will do an in-place execution of the
Python binary and pass it any arguments that were originally passed to
this script when it was called. The in-place execution is useful to
free the memory of the shell when you run Python, though not strictly
necessary with the huge amount of memory on our cluster nodes. Think
of it as being technically correct, the best kind of correct.
Make sure you set the execute bit on this script before trying to use. See above for an example of how to do this.
Now that we have a script to load our environment and run Python, we just need to tell HTCondor to run the script and pass along our script name and arguments.
1Universe = vanilla
2Executable = run_py.sh
3Arguments = test_venv.py hello world
4
5Output = output.log
6Error = error.log
7Log = condor.log
8
9Request_Cpus = 1
10Request_Memory = 2GB
11
12Queue
The lines 2 and 3 are the only things that changed from above basic Python example. Here the executable becomes our wrapper script, and the first argument we pass is the name of the Python file to run, followed by it’s arguments if there are any.
Note
Make sure that the my_venv directory, the run_py.sh script, and test_venv.py are all in the same directory.
Finally, here’s a sample session of putting it all together and running it.
(my_venv) USER@cluster-head:~/example_submit_scripts/python_venv$ ls
my_venv run_py.sh test_venv.py venv_python.job
(my_venv) USER@cluster-head:~/example_submit_scripts/python_venv$ condor_submit venv_python.job
Submitting job(s).
1 job(s) submitted to cluster 68.
(my_venv) USER@cluster-head:~/example_submit_scripts/python_venv$ cat output.log
Hello, USER!
You're running on the host c-X-X.cluster.cs.wwu.edu
________________________________________________________________
/ _ _ _ _ _ \
| | |__ ___ | || | ___ __ __ ___ _ __ | | __| | |
| | '_ \ / _ \| || | / _ \ \ \ /\ / / / _ \ | '__|| | / _` | |
| | | | || __/| || || (_) | \ V V / | (_) || | | || (_| | |
| |_| |_| \___||_||_| \___/ \_/\_/ \___/ |_| |_| \__,_| |
\ /
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
Manually setting the PATH environment variable
While the wrapper script is much more flexible in terms of scripting out actions to take as if you were at the shell, if you don’t want to use it there is a way around it: Manually setting your PATH environment variable to prefix your virtual environment’s full path. This is dependent on a few things that the wrapper script is not:
The submission file being in the same directory as the venv
Your shell properly setting the PWD environment variable (most do)
1Universe = vanilla
2PyVenv = my_venv
3Executable = test_venv.py
4Arguments = hello world
5Environment = PATH=$ENV(PWD)/$(PyVenv)/bin:$ENV(PATH);
6
7Output = output.log
8Error = error.log
9Log = condor.log
10
11Request_Cpus = 1
12Request_Memory = 2GB
13
14Queue
Here lines 2-5 are what is important:
Specify which environment to load. This is a local variable, and won’t be sent to the scheduler. It’s just to make this submission file easily reusable between projects.
Execute the Python script directly. (Make sure it has the execute bit set!)
Any arguments you want to pass to your program
This is the line that does all of work. It override’s your PATH environment variable by prefixing the bin directory of your virtual environment specified in the PyVenv variable.
And to show it in action, a simple session of submitting the file and viewing the results:
USER@cluster-head:~/example_submit_scripts/python_venv$ ls
my_venv old test_venv.py venv_manual_python.job
USER@cluster-head:~/example_submit_scripts/python_venv$ condor_submit venv_manual_python.job
Submitting job(s).
1 job(s) submitted to cluster 79.
USER@cluster-head:~/example_submit_scripts/python_venv$ cat output.log
Hello, USER!
You're running on the host c-X-X.cluster.cs.wwu.edu
________________________________________________________________
/ _ _ _ _ _ \
| | |__ ___ | || | ___ __ __ ___ _ __ | | __| | |
| | '_ \ / _ \| || | / _ \ \ \ /\ / / / _ \ | '__|| | / _` | |
| | | | || __/| || || (_) | \ V V / | (_) || | | || (_| | |
| |_| |_| \___||_||_| \___/ \_/\_/ \___/ |_| |_| \__,_| |
\ /
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
Anaconda / Miniconda
While you can install Anaconda, it will consume most of your disk quota in your home directory. This is because it installs hundreds of packages that you most likely will never use or need. Fortunately, you can leverage Miniconda to install a base system, and use the Conda package manager to install the few packages you do want or need. Miniconda can still consume a lot of disk space, so it’s best to keep your virtual environments in your shared research directory.
Downloading and Installing Miniconda
You can download the latest release of the installer by using wget and the conveniently named “latest” package. This package will always point to what ever the latest version is, without needing to change the URL each time. The URL to fetch Miniconda3’s latest package is currently: https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
This sample shell-session shows the process of fetching the latest installer:
USER@c-X-X:~/demo$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
--2023-11-06 09:44:39-- https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.131.3, 104.16.130.3, 2606:4700::6810:8203, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.131.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 120771089 (115M) [application/x-sh]
Saving to: ‘Miniconda3-latest-Linux-x86_64.sh’
Miniconda3-latest-L 100%[===================>] 115.18M 70.8MB/s in 1.6s
2023-11-06 09:44:41 (70.8 MB/s) - ‘Miniconda3-latest-Linux-x86_64.sh’ saved [120771089/120771089]
Note
The example below uses the variable $PI, which defines the Primary Investigator (your research advisor). You will either want to set this variable to your PI’s surname, or type it in manually if you don’t want to set it. Below is an example of setting it in the default Bash shell:
USER@c-X-X:~/demo$ export PI=researcher_lastname
USER@c-X-X:~/demo$ echo $PI
researcher_lastname
Once you have the latest installer, you can run it with the Bash shell. If you just run the installer with no arguments, it will interactively prompt you to agree to the license, ask where you want to install Miniconda, and finally ask if you want to auto-initialize it with your shell startup. Instead, we will pass it the command line options to bypass all these questions, which is really handy.
The example below shows that we will use three options passed to the installer:
-b
This enables the batch install mode. i.e. Don’t prompt for any input.-u
This enables the update mode. If Miniconda is already installed at the specified path (the next option), then don’t replace it, just update it. This makes this copy/paste example safer.-p
The path of where to install Miniconda. Make sure it ends in a “miniconda” name of some sorts (miniconda3 is suggested).
It’s important to note the path that is specified in this example uses
the above mentioned $PI
variable, and the $USER
variable. You will need to set $PI
or specify it manually on
the command line if you don’t. You will not need to set
$USER
if you used the getenv=true
when you ran
condor_submit -i. The example path may not be where your PI
would like you to put your install and environments; Speak with them
to confirm where they would like it to be installed.
USER@c-X-X:~/demo$ bash ./Miniconda3-latest-Linux-x86_64.sh -b -u -p /cluster/research-groups/$PI/workspace/$USER/miniconda3
PREFIX=/cluster/research-groups/researcher_lastname/workspace/USER/miniconda3
Unpacking payload ...
Installing base environment...
Downloading and Extracting Packages
Downloading and Extracting Packages
Preparing transaction: done
Executing transaction: done
installation finished.
Now that the installer has been run, we will want to enable Conda in our shell’s startup script. To do this, we call the full path to the conda command, and specify the sub-command init bash. After it installs the startup code, it prints a notice that we should close and re-open our shell, but there’s a workaround for this as well.
Note
In the example below the full path to the conda binary gets colored incorrectly, it should include the first / after the $, and up to the word conda.
USER@c-X-X:~/demo$ /cluster/research-groups/$PI/workspace/$USER/miniconda3/bin/conda init bash
no change /cluster/research-groups/researcher_lastname/workspace/USER/miniconda3/condabin/conda
no change /cluster/research-groups/researcher_lastname/workspace/USER/miniconda3/bin/conda
no change /cluster/research-groups/researcher_lastname/workspace/USER/miniconda3/bin/conda-env
no change /cluster/research-groups/researcher_lastname/workspace/USER/miniconda3/bin/activate
no change /cluster/research-groups/researcher_lastname/workspace/USER/miniconda3/bin/deactivate
no change /cluster/research-groups/researcher_lastname/workspace/USER/miniconda3/etc/profile.d/conda.sh
no change /cluster/research-groups/researcher_lastname/workspace/USER/miniconda3/etc/fish/conf.d/conda.fish
no change /cluster/research-groups/researcher_lastname/workspace/USER/miniconda3/shell/condabin/Conda.psm1
no change /cluster/research-groups/researcher_lastname/workspace/USER/miniconda3/shell/condabin/conda-hook.ps1
no change /cluster/research-groups/researcher_lastname/workspace/USER/miniconda3/lib/python3.11/site-packages/xontrib/conda.xsh
no change /cluster/research-groups/researcher_lastname/workspace/USER/miniconda3/etc/profile.d/conda.csh
modified /cluster/home/USER/.bashrc
==> For changes to take effect, close and re-open your current shell. <==
While the above example ends with “For changes to take effect, close and re-open your current shell.”, we can cheat and just load our .bashrc file again.
USER@c-X-X:~/demo$ source ~/.bashrc
(base) USER@c-X-X:~/demo$
Notice that the prompt has changed to show that you have a conda environment named “base” loaded. You can now use the conda command to create and manage environments and packages.
In the following examples we will create a small environment, add a package to it, write a python script to leverage it, and setup an HTCondor submission file to run it.
First, we create a new environment using the conda create command. This lets you specify many useful options as detailed in the previous link, but we’ll just specify a friendly name to keep it easy.
(base) USER@c-X-X:~/demo$ conda create --name cow
Retrieving notices: ...working... done
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /cluster/research-groups/researcher_lastname/workspace/USER/miniconda3/envs/cow
Proceed ([y]/n)? y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate cow
#
# To deactivate an active environment, use
#
# $ conda deactivate
Now that we have an environment, we can use the conda activate to set it as our active environment, so that when we install packages or want to use them they come from the correct environment.
(base) USER@c-X-X:~/demo$ conda activate cow
(cow) USER@c-X-X:~/demo$
Notice that the prompt has changed to reflect that we now have the “cow” environment activated, where we will want to install the “cowpy” package to be used in our script.
(cow) USER@c-X-X:~/demo$ conda install -c conda-forge cowpy
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /cluster/research-groups/researcher_lastname/workspace/USER/miniconda3/envs/cow
added / updated specs:
- cowpy
The following packages will be downloaded:
package | build
---------------------------|-----------------
_libgcc_mutex-0.1 | conda_forge 3 KB conda-forge
_openmp_mutex-4.5 | 2_gnu 23 KB conda-forge
bzip2-1.0.8 | h7f98852_4 484 KB conda-forge
ca-certificates-2023.7.22 | hbcca054_0 146 KB conda-forge
cowpy-1.1.5 | pyhd8ed1ab_1 22 KB conda-forge
ld_impl_linux-64-2.40 | h41732ed_0 688 KB conda-forge
libexpat-2.5.0 | hcb278e6_1 76 KB conda-forge
libffi-3.4.2 | h7f98852_5 57 KB conda-forge
libgcc-ng-13.2.0 | h807b86a_2 753 KB conda-forge
libgomp-13.2.0 | h807b86a_2 411 KB conda-forge
libnsl-2.0.1 | hd590300_0 33 KB conda-forge
libsqlite-3.44.0 | h2797004_0 826 KB conda-forge
libuuid-2.38.1 | h0b41bf4_0 33 KB conda-forge
libzlib-1.2.13 | hd590300_5 60 KB conda-forge
ncurses-6.4 | h59595ed_2 864 KB conda-forge
openssl-3.1.4 | hd590300_0 2.5 MB conda-forge
pip-23.3.1 | pyhd8ed1ab_0 1.3 MB conda-forge
python-3.12.0 |hab00c5b_0_cpython 30.6 MB conda-forge
readline-8.2 | h8228510_1 275 KB conda-forge
setuptools-68.2.2 | pyhd8ed1ab_0 454 KB conda-forge
tk-8.6.13 | h2797004_0 3.1 MB conda-forge
tzdata-2023c | h71feb2d_0 115 KB conda-forge
wheel-0.41.3 | pyhd8ed1ab_0 57 KB conda-forge
xz-5.2.6 | h166bdaf_0 409 KB conda-forge
------------------------------------------------------------
Total: 43.3 MB
The following NEW packages will be INSTALLED:
_libgcc_mutex conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
_openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-2_gnu
bzip2 conda-forge/linux-64::bzip2-1.0.8-h7f98852_4
ca-certificates conda-forge/linux-64::ca-certificates-2023.7.22-hbcca054_0
cowpy conda-forge/noarch::cowpy-1.1.5-pyhd8ed1ab_1
ld_impl_linux-64 conda-forge/linux-64::ld_impl_linux-64-2.40-h41732ed_0
libexpat conda-forge/linux-64::libexpat-2.5.0-hcb278e6_1
libffi conda-forge/linux-64::libffi-3.4.2-h7f98852_5
libgcc-ng conda-forge/linux-64::libgcc-ng-13.2.0-h807b86a_2
libgomp conda-forge/linux-64::libgomp-13.2.0-h807b86a_2
libnsl conda-forge/linux-64::libnsl-2.0.1-hd590300_0
libsqlite conda-forge/linux-64::libsqlite-3.44.0-h2797004_0
libuuid conda-forge/linux-64::libuuid-2.38.1-h0b41bf4_0
libzlib conda-forge/linux-64::libzlib-1.2.13-hd590300_5
ncurses conda-forge/linux-64::ncurses-6.4-h59595ed_2
openssl conda-forge/linux-64::openssl-3.1.4-hd590300_0
pip conda-forge/noarch::pip-23.3.1-pyhd8ed1ab_0
python conda-forge/linux-64::python-3.12.0-hab00c5b_0_cpython
readline conda-forge/linux-64::readline-8.2-h8228510_1
setuptools conda-forge/noarch::setuptools-68.2.2-pyhd8ed1ab_0
tk conda-forge/linux-64::tk-8.6.13-h2797004_0
tzdata conda-forge/noarch::tzdata-2023c-h71feb2d_0
wheel conda-forge/noarch::wheel-0.41.3-pyhd8ed1ab_0
xz conda-forge/linux-64::xz-5.2.6-h166bdaf_0
Proceed ([y]/n)? y
Downloading and Extracting Packages:
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(cow) USER@c-X-X:~/demo$
You may have noticed that it installed far more than just our intended “cowpy” package. This is because Conda is ensuring that it has ALL the libraries and files it needs to be able to run something in that environment, including the gcc libraries need to run Python. This is complete overkill for a lot of projects, but is incredibly useful for those who need to be able to have their research reproduced exactly. Now that the package we want is installed, we can write a small Python example that uses it, then write a wrapper script that will ensure we have the correct environment activated before running it, and finally a small HTCondor submission file so that it can be scheduled to run on an execute point.
The example script below is largely the same as the one in the previous Virtual Environments section, with two lines changed to use the different library that we installed in our Conda environment instead.
1#!/usr/bin/env python3
2
3import cowpy.cow, getpass, os, sys
4
5print('Hello, ', getpass.getuser(), '!', sep='')
6print("You're running on the host", os.uname()[1])
7
8print(cowpy.cow.Cowacter().milk(' '.join((sys.argv[1:]))))
Now that we have a tiny example script using our newly installed package, we’ll want to try and run a quick test with it interactively, then with a wrapper script, and finally via a job submission file to ensure that we can submit the jobs correctly.
Since we’re already in our activated environment, we can make sure it has the execute bit set, then execute it directly.
(cow) USER@c-X-X:~/demo$ ./test_conda.py Hello, World!
Hello, USER!
You're running on the host c-X-X.cluster.cs.wwu.edu
_______________
< Hello, World! >
---------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
To be able to run the Python code without having to manually activate our correct environment is critical for submitting jobs to the cluster. In order to accomplish this we will use a tiny wrapper script that loads Conda, then activates the desired environment, and finally runs our script.
1#!/bin/sh
2
3. /cluster/research-groups/researcher_lastname/workspace/USER/miniconda3/etc/profile.d/conda.sh
4conda activate cow
5exec python3 "$@"
This script is again similar to one previously used in the Virtual Environments section, with a couple of small, but important changes. Line 3 now has a full path to the conda.sh script. This is because it’s not easily available as a relative path from where our code lives. Line 4 is different entirely, because we must now explicitly call out which environment to activate. Finally, line 5 is identical to the previous example. Now that the environment is activated, we just call Python and pass it any arguments the script was given. Don’t forget to ref:chmod <chmod-x>) the run_conda.sh script!
Note
Be sure to adjust the path to the conda.sh script, as your username is not “USER”, and your PI is not “researcher_lastname”. You can see the full path in your ~/.bashrc file, it should be in a section towards the bottom with a label “>>> conda initialize >>>”.
To confirm the script works, we first deactivate our cow environment, and run the new wrapper script in our interactive session.
(cow) USER@c-X-X:~/demo$ conda deactivate
(base) USER@c-X-X:~/demo$ ./run_conda.sh ./test_conda.py Hello, World!
Hello, USER!
You're running on the host c-X-X.cluster.cs.wwu.edu
_______________
< Hello, World! >
---------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
Now that we have confirmed our wrapper script can load the correct environment and run our Python script, we will want to write an HTCondor job submission file to package the whole thing together so we can schedule jobs in the cluster environment.
1Universe = vanilla
2Executable = run_conda.sh
3Arguments = test_conda.py hello world
4
5Output = output.log
6Error = error.log
7Log = condor.log
8
9Request_Cpus = 1
10Request_Memory = 2GB
11
12Queue
This submission file is again similar to one previously used in the Virtual Environments section, with two small changes. The name of the executable has changed to point to the run_conda.sh instead of run_py.sh. The second change is that it now points to the test_conda.py instead of test_venv.py. All that changed is the filenames. This an example of how easy it is to write HTCondor submission scripts, and how they can be reused quickly.
Of course, we’ll want to submit the job and verify the results; close your interactive session (or open a new connection to the head node so you can submit a job).
(base) USER@cse-head:~/demo$ ls
conda.job run_conda.sh test_conda.py
(base) USER@cse-head:~/demo$ condor_submit conda.job
Submitting job(s).
1 job(s) submitted to cluster 10576.
(base) USER@cse-head:~/demo$ cat output.log
Hello, USER!
You're running on the host c-X-X.cluster.cs.wwu.edu
_____________
< hello world >
-------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
Conda offers a more flexible environment over a standard Python virtual environment. But this flexibility can come with increased complexity as well when trying to use all the features it offers. Please see the full Conda documentation for more help and usage of Conda.