Scheduled Maintenance & Downtime Archive

09/05/2024 – CF Building Power Shutdown

On Thursday, September 5th from 6 AM to 10 AM the power will be shut off in the Communications Facility. While the cluster systems are housed in the data center in the AC building, there is networking infrastructure that is housed in the CF building. Without these systems running no new connections can be made to or from the cluster environment. Existing jobs should continue to run without issue unless they are dependent on external data sources.

Access will be unavailable beginning the evening before the power outage so systems in CF can be safely shutdown, and will be restored as quickly as possible once power is restored.

03/26/2024 - 03/28/2024 Cluster Inaccessible Due to CF Power Shutdown

The Communications Facility (CF) is scheduled to have it’s power shutdown from 5AM - 9PM on Wednesday, March 26th to connect power to the new Kaiser Borsari Hall (KB). To prepare for the power outage, the network equipment and servers in the CF building will be shutdown the night before, Tuesday, March 26th. An additional power outage is scheduled from 5AM - 3PM on Thursday, March 27th. Once the power is restored after the second outage window the network equipment and servers will be brought back online and connectivity to the cluster will be restored. Jobs that are currently running on the cluster will continue running, but users will be unable to submit new ones or monitor existing ones.

12/12/202 - 12/27/2022 Move to Rocky Linux and WWU Domain

Operating System Changes

The underlying operating system that runs on all the cluster systems will be changed from Ubuntu 18.04 to the Rocky Linux 8. While this change means that most compiled software will need to be rebuilt to run in the new environment, it ultimately puts us in line with the requirements and the opportunity to join the OSG (Open Science Grid).

I have already been in contact with some researchers leveraging compiled research programs and am in the process of getting things rebuilt for them in the new environment.

Account / Login Changes

After this update, your cluster login will be your WWU universal account. There will no longer be a separate, cluster account. Your WWU universal account password can be used to login to the cluster. SSH Keys will no longer be required for account creation or login, though they will still be supported. Updated documentation for account creation will go live after the upgrades have all been performed.

Because the logins are tied to universal accounts, cluster access will be subject to the standard account removal policy when a student graduates. This will grant students one term after graduation to collaborate with their PI on moving their data to an agreed upon location before the account and home directories are removed. If a student will continue doing research with faculty after their graduation, the department can sponsor their account for up to one year at a time to keep their account active.

Please see the ATUS page for information about account removals: https://atus.wwu.edu/kb/after-leaving-western-when-will-my-account-be-deleted

Downtime / File Server Updates

The cluster will be offline starting Monday, December 12th, and should return to service by Tuesday, December 27th. While the upgrade to Rocky Linux should only take a day or two, the changes to the account system mentioned above will require every file on the file server to be updated to match your new account information. That is, for every file the old user and group ID must be read from the disk, mapped to a new user and group ID, and be written back to the disk. While I have automated this task, applying it to the hundreds of millions of small files that have accumulated on the file server is estimated to take a couple of weeks to complete. If it takes less time to complete, I will send the access announcement early.

08/31/2022 – CF Building Power Shutdown

On Wednesday, August 31st from 7 AM to 12 PM the power will be shut off in the Communications Facility. While the cluster systems are housed in the data center in the AC building, there is networking infrastructure as well as login servers that are housed in the CF building. Without these systems no new connections can be made to or from the cluster environment. Existing jobs should continue to run without issue unless they are dependent on external data sources.

Access will be unavailable beginning the evening before the power outage so systems can be safely shutdown, and will be restored as quickly as possible once power is restored.

09/07/2021 - 09/13/2021 – UPDATED HTCondor 9.0 Upgrade & File Server Maintenance

UPDATE - 09/10 5:45PM: The HTCondor 9.0 upgrade is complete, but the fileserver work is ongoing. Due to the enormous amount of files that needed migration it has progressed much slower than anticipated. Currently 146 out of 305 home directories have been migrated. I will keep the migration process running over the weekend and monitor it’s progress. If the migration is not completed by Monday morning (9/13), I will stop it and reach out to any remaining users who will be affected by it when I open up access to the cluster(s) again. My apologies for the extended downtime. -Zach

Beginning the morning of Tuesday, September 7th, and up until the evening of Friday, September 10th the cluster environments will be offline for upgrades to the next major release of HTCondor (9.0), and upgrades to the file server.

The HTCondor 9.0 release includes an update to the networking protocol used between the central manager node(s) and the execute nodes, requiring that they all be running the same version. This means all jobs must be stopped, the upgrade performed, and then jobs can be resumed again. There have also been many additional changes to the configuration structure and security implementation requiring additional setup for each node to be able to communicate, which is why this is an extended downtime for the environment.

While HTCondor is being upgraded, cluster account home directories and shared research directories will be migrated to ZFS datasets. This migration will facilitate nightly snapshots and data backups on a per user and per research group basis.

09/01/2021 – CF Building Power Shutdown

On Wednesday, September 1st from 7 to 11 AM the power will be shut off in the Communications Facility. While the cluster systems are housed in the data center in the AC building, there is networking infrastructure as well as login servers that are housed in the CF building. Without these systems no new connections can be made to or from the cluster environment. Existing jobs should continue to run without issue unless they are dependent on external data sources.