Linux/mac
ssh XXXX@puhti.csc.fi (replace XXXX with your user account)
Windows/PuTTY
*host: puhti.csc.fi
login as: XXXX (replace XXXX with your account number)
In Puhti check you environment with command:
csc-workspaces
Switch to the scratch directory of your project
cd /scratch/project_2002389
And create your own sub-directory, named after you training account:
mkdir XXXX
(relace XXXX with your user account)
Make the directory permissions such, that other group members can only read the contents but not modify it
chmod g-wx XXXX
move to the new directory.
cd XXXX
Next download a dataset from internet and uncompress it. The dataset contains some pythiun genomes with related BWA indexes.
curl https://a3s.fi/course_12.11.2019/pythium.tgz > pythium.tgz
ls -ltr
tar zxvf pythium.tgz
ls -ltr
tree pythium
Open connection to Allas:
module load allas
allas-conf
Upload the data from Puhti to Allas with rclone. Use the command below (replace XXXX with your user account):
rclone -P copyto pythium allas:xxxx-genomes-rc/
How long did the data upload took? What was the transfer rate? How long would it take to transfer 100 GB with the same speed?
Then study what you have uploaded to Allas with commands:
rclone lsd allas:
rclone ls allas:xxxx-genomes-rc/
rclone lsl allas:xxxx-genomes-rc/
rclone lsf allas:xxxx-genomes-rc/
Check how this looks like in the Pouta web interface. Open browser and go to: https://pouta.csc.fi/
In Pouta interface, go to object store section, list the buckets (that are here called as “Containers”). Locate your own xxxx-genomes-rc directory and download one of the uploaded fasta files to your local computer.
Upload the pyhium directory from to Allas using following commands (replace XXXX with your user account)
A-put case 1: Store everything in one object:
a-put pythium
a-list
a-list projectnumber-puhti-SCRATCH
a-info projectnumber-puhti-SCRATCH/xxxx/pythium.tar.zst
A-put case 2: Each subdirectory (species) as one object:
a-put pythium/*
a-list 2002389-puhti-SCRATCH/trng_xxxx
a-check pythium/*
a-info 2002389-puhti SCRATCH/training027/pythium/pythium_vexans.tar.zst
A-put case 3: Use your own bucket name
a-put pythium/* -b xxxx-genomes-ap
a-list xxxx-genomes-ap
A-put case 4: Upload files without compression.
a-put --nc pythium/pythium_vexans/bwaindex/* -b XXXX_ap_vexans_bwa
a-list XXXX_ap_vexans_bwa
Can you see the difference between the four a-put commands above?
Study the xxxx-genomes-ap bucket with commands
a-list xxxx-genomes-ap
rclone ls allas:xxxx-genomes-ap
Why the two commands above list different amount of objects?
Try command:
a-info xxxx-genomes-ap/pythium_vexans.tar.zst
which is actually the same as:
rclone cat allas:xxxx-genomes-ap/pythium_vexans.tar.zst_ameta
Finally try command:
a-flip pythium/pythium_vexans/pythium_vexans.fasta
Try opening the public link that a-flip produced, with your browser.
Run commands:
allas-backup –help
allas-backup pythium
allas-backup list
What did these commands do for your data?
The data in pythium directory is now stored in many ways to Allas so we can remove the data from puhti and log out.
rm -r pythium
exit
Linux/mac
ssh xxxx@puhti.csc.fi (replace xxxx with your user account )
Windows/PuTTY
host: puhti.csc.fi
login as: xxxx (replace xxxx with your CSC account )
In Puhti check you projects with command:
csc-workspaces
Go to your personal scratch directory of your project.
cd /scratch/project_yourprojectnumber/trng_xxxx
Set up Allas connection
module load allas
allas-conf
Then run commands
a-list
rclone lsd allas:
a-list xxxx-genomes-ap
rclone ls allas:xxxx-genomes-ap
a-find pythium_vexans.fasta
a-find -a pythium_vexans.fasta
Next download the data in different ways:
mkdir rclone_dir
cd rclone_dir/
mkdir all
rclone ls allas:xxxx-genomes-rc
rclone copyto -P allas:xxxx-genomes-rc all/
ls -l all
mkdir vexans
rclone copyto allas:xxxx-genomes-rc/pythium_vexans vexans/
ls -l vexans
example 3: copy just one object
rclone copyto allas:trng_xxxx-genomes-rc/pythium_vexans/pythium_vexans.fasta \ ./vexans.fasta
ls -l
Return to your XXXX directory in Puhti scratch
cd ..
Check that you are in right place:
pwd
The pwd command should print /scratch/project_projnum/XXXX
Make a new directory
text
mkdir a_dir
cd a_dir/
create directory all and go there:
mkdir all
cd all
list your default scratch bucket.
a-list projectnumber-puhti-SCRATCH
a-list projectnumber-puhti-SCRATCH/xxxx
Look for file pythium_vexans.fasta in Puhti SCRATCH bucket:
a-find pythium_vexans.fasta -b projectnumber-puhti-SCRATCH
download the full dataset with command:
a-get projectnumber-puhti-SCRATCH/trng_xxxx/pythium.tar.zst
And check what you got:
ls -l
ls -R
Now get just one genome dataset:
cd ..
a-get projectnumber-puhti-SCRATCH/xxxx/pythium/pythium_vexans.tar.zst
ls -l pythium/
ls -l pythium/pythium_vexans/
Return to your main scratch directory and make a new directory:
cd ..
mkdir a_backup
cd a_backup/
Use the commands below, to find out the ID of the most recent version backup of your pythium directory:
allas-backup list
allas-backup list | grep $USER
Then use allas-backup restore to download the data:
allas-backup restore ID-string
ls -l
la -l pythium