BioMonth - CSC supercomputing and data management for bioscientists

Using GREASY for running multiple Gaussian jobs on Puhti

This tutorial requires that you have a user account at CSC and that your account belongs to a project that has access to Puhti service. You should also belong to the Gaussian users group

Overview

The workflow of this exercise is:

Download 10 sample 3D molecular structures

mkdir -p /scratch/yourprojectname/gaussian_greasy
cd /scratch/yourprojectname/gaussian_greasy 

wget https://a3s.fi/C6H12_structures_10/C6H12_structures_10.tgz

tar xzf C6H12_structures_10.tgz

cd C6H12_structures_10

Convert these structures to Gaussian format

module load openbabel
obabel *.mol -ocom -m

Construct the corresponding Gaussian input files

for i in *.com; do sed -i '1s/^/#b3lyp\/cc-pVDZ \n/' $i; done

for i in *.com; do sed -i '1s/^/%NProcShared=4\n/' $i; done

Build a GREASY tasklist to run the jobs

Here we write a Bash script that will create a suitable tasklist file for GREASY

cd ..

First we write a Bash script that will create a tasklist that can be processed by greasy.

#!/bin/bash
#
submission_dir=$PWD                            # Directory from where the job is submitted    
com_dir=${submission_dir}/C6H12_structures_10  # Subdirectory containing the com files 
Ntasks=$(ls -l ${com_dir}/*.com|wc -l)         # Number of tasks equals the number of com-files
Ncores=4                                       # Number of threads per task 
rm -f greasy_"${Ntasks}".tasklist              # Remove possible old tasklist
for f in ${com_dir}/*.com;                     # Loop over all com files and create a separate
do                                             # output directory named after the input file name
input_base=`basename ${f%%.*}`
mkdir -p output/${input_base}
# Write all the Gaussian command lines into a common tasklist file 
echo "g16 < ${f} > output/${input_base}/${input_base}.log" >> greasy_"${Ntasks}".tasklist
done

bash ./generate_tasklist.bash

After running the script you should have a tasklist file greasy_10.tasklist that contains
the Gaussian executing commands for the 10 com files on separate lines, like

g16 < /scratch/yourprojectname/gaussian_greasy/C6H12_structures_10/10737.com > output/10737/10737.log
g16 < /scratch/yourprojectname/gaussian_greasy/C6H12_structures_10/10775.com > output/10775/10775.log
g16 < /scratch/yourprojectname/gaussian_greasy/C6H12_structures_10/10776.com > output/10776/10776.log
g16 < /scratch/yourprojectname/gaussian_greasy/C6H12_structures_10/11109.com > output/11109/11109.log
...

Submit the GREASY tasklist

module load greasy gaussian

sbatch-greasy --cores 4 --time 02:00 --nodes 1 --account yourprojectname greasy_10.tasklist
sbatch-greasy --cores 4 --time 02:00 --nodes 1 --account yourprojectname greasy_10.tasklist

Task list "greasy_10.tasklist" includes 10 tasks.
The first two rows of the task list are:

g16 < /scratch/yourprojectname/gaussian_greasy/C6H12_structures_10/10737.com > output/10737/10737.log
g16 < /scratch/yourprojectname/gaussian_greasy/C6H12_structures_10/10775.com > output/10775/10775.log

-------------------------------------------------------------------------
Submitting GREASY job consisting of 10 tasks to 1 nodes.
The job will run 10 tasks at the time each using 4 cores.
The maximum runtime reseved to process all the tasks is 0 h 5 m.

Job submitted with ID 5162452

You can monitor the progress of the task with command:
  squeue -j 5162452

Once the job has started you can monitor the progress of the job with command:
 tail -f greasy-5162452.log

Check the GREASY tasklist results

grep Summary greasy-*.log
INFO: Summary of 10 tasks: 9 OK, 1 FAILED, 0 CANCELLED, 0 INVALID.

Charge and Multiplicity card seems defective:
 Wanted an integer as input.
                                                                                 
 ?
 Error termination via Lnk1e in /appl/soft/chem/gaussian/G16RevC.01_new/g16/l101.exe at Tue Mar  9 20:10:15 2021.
 Job cpu time:       0 days  0 hours  0 minutes  0.8 seconds.
 Elapsed time:       0 days  0 hours  0 minutes  0.2 seconds.
 File lengths (MBytes):  RWF=      6 Int=      0 D2E=      0 Chk=      1 Scr=      1
sbatch-greasy --cores 4 --time 02:00 --nodes 1 --account yourprojectname greasy_10.tasklist-undefined.rst
grep Summary greasy-*.log
INFO: Summary of 1 tasks: 1 OK, 0 FAILED, 0 CANCELLED, 0 INVALID.
grep -rnw 'output/' -e 'E(RB3LYP)'
output/12446/12446.log:265: SCF Done:  E(RB3LYP) =  -235.836869989     A.U. after   13 cycles
output/10737/10737.log:265: SCF Done:  E(RB3LYP) =  -235.826753630     A.U. after   13 cycles
output/10776/10776.log:246: SCF Done:  E(RB3LYP) =  -235.851091573     A.U. after   12 cycles
output/10775/10775.log:246: SCF Done:  E(RB3LYP) =  -235.835303716     A.U. after   12 cycles
output/11742/11742.log:246: SCF Done:  E(RB3LYP) =  -235.845122585     A.U. after   13 cycles
output/7024/7024.log:246: SCF Done:  E(RB3LYP) =  -235.875921299     A.U. after   11 cycles
output/553629/553629.log:262: SCF Done:  E(RB3LYP) =  -235.838082463     A.U. after   11 cycles
output/12201/12201.log:246: SCF Done:  E(RB3LYP) =  -235.823223660     A.U. after   13 cycles
output/7787/7787.log:246: SCF Done:  E(RB3LYP) =  -235.882771348     A.U. after   10 cycles
output/11109/11109.log:265: SCF Done:  E(RB3LYP) =  -235.823585171     A.U. after   12 cycles