Notifications

Overview

Have you ever wanted to know when your job finished, completed successfully, or errored out?

The HTCondor notification variable does just that! When enabled, it can send you an email when any of the above happens!

There are four possible settings, two of which are identical, one of which disables notifications, and one limits when they will be sent.

The HTCondor man page for condor_submit says…

notification = <Always | Complete | Error | Never>

Owners of HTCondor jobs are notified by e-mail when certain events occur. If defined by Always or Complete, the owner will be notified when the job terminates. If defined by Error, the owner will only be notified if the job terminates abnormally, (as defined by JobSuccessExitCode, if defined) or if the job is placed on hold because of a failure, and not by user request. If defined by Never (the default), the owner will not receive e-mail, regardless to what happens to the job. The HTCondor User’s manual documents statistics included in the e-mail.

There’s a lot there, and still said in a very succint way. To make it even shorter, here’s the TL;DR version:

  • Always

  • Complete

    If a job finishes, send an email. It doesn’t matter if the job finished with an error or not, just send the summary email.

  • Error

    If the job had an error and terminated before completion, send an email.

  • Never

    Don’t send an email. This is the default if you don’t set the notification variable.

Example Submission Files

HTCondor basic notification submit example
 1Executable = test.sh
 2
 3Output     = out.log
 4Error      = err.log
 5Log        = condor.log
 6
 7Request_Cpus   = 1
 8Request_Memory = 2GB
 9
10Notification = Always
11
12Queue

The single, emphasized line is the only line that matters in the example. It simply turns on the notifications. It could be set to Error if you only wanted to be notified that the job failed. Assuming the test.sh script ran correctly, this example would send a message like the example email below.

Example Emails

Sample email for a completed job
From: csaw.support@wwu.edu
To: USER@wwu.edu
Subject: [Condor] Condor Job 17521.0
Body:

This is an automated email from the Condor system
on machine "cse-head.cluster.cs.wwu.edu".  Do not reply.

Condor job 17521.0
     /cluster/home/USER/example_submit_scripts/email/test.sh
     submitted from directory /cluster/home/USER/example_submit_scripts/email
exited normally with status 0


Submitted at:        Fri Jul 26 08:35:15 2024
Completed at:        Fri Jul 26 08:35:22 2024
Real Time:           0 00:00:07

Virtual Image Size:  720 Kilobytes

Statistics from last run:
Allocation/Run time:     0 00:00:06
Remote User CPU Time:    0 00:00:00
Remote System CPU Time:  0 00:00:00
Total Remote CPU Time:   0 00:00:00

Statistics totaled from all runs:
Allocation/Run time:     0 00:00:06

Network:
  0.0 B  Run Bytes Received By Job
  0.0 B  Run Bytes Sent By Job
  0.0 B  Total Bytes Received By Job
  0.0 B  Total Bytes Sent By Job


-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Questions about this message or HTCondor in general?
Email address of the local HTCondor administrator: csaw.support@wwu.edu
The Official HTCondor Homepage is http://www.cs.wisc.edu/htcondor

Limits

Currently, the HTCondor configuration on the backend prohibits you from submitting a batch size of more than 5 jobs when notifications are enabled. This ensures you won’t submit 10,000 jobs and have all 10,000 fail or complete in rapid succession and have your inbox overflowing with mail.