Lustre-based project-specific directories, scratch and projappl, can store large amounts of data and are accessible to all compute nodes of Puhti. However, these directories are not good for managing a large number of files. If you need to work with a huge number of smaller files, you should consider using the NVMe based local temporary scratch directories, either through normal or interactive batch jobs. Read more about the advantages of using local scratch drive on CSC docs pages
Below is a normal batch job that pulls docker image from DockerHub and converts into a singularity one that is compatible with working in HPC environments such as CSC Puhti and Mahti supercomputers. During the conversion process, several layers are retrieved, cached and then converted into a singularity file (.sif format)
#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --partition=small
#SBATCH --account=project_xxx
export SINGULARITY_TMPDIR=/scratch/project_xxx/$USER
export SINGULARITY_CACHEDIR=/scratch/project_xxx/$USER
singularity pull --name trinity.simg docker://trinityrnaseq/trinityrnaseq
Copy above script to a file (e.g.,batch_job.sh) and modify it. You can then submit the script file to compute nodes using the following command:
sbatch batch_job.sh
#SBATCH --gres=nvme:<local_storage_space_per_node> # e.g., to claim 200 GB of storage, use option --gres=nvme:200
#!/bin/bash #SBATCH --time=01:00:00 #SBATCH --partition=small #SBATCH --account=project_xxx #SBATCH --gres=nvme:100 export SINGULARITY_TMPDIR=$LOCAL_SCRATCH export SINGULARITY_CACHEDIR=$LOCAL_SCRATCH unset XDG_RUNTIME_DIR cd $LOCAL_SCRATCH #pwd #df -lh singularity pull --name trinity.simg docker://trinityrnaseq/trinityrnaseq mv trinity.simg /scratch/project_xxx/$USER/