Running QIIME2 on ARGO
Versions installed:
- QIIME2/2019.10
- QIIME2/2020.11 (conda environment)
Setting up your environment
To demonstrate how to use qiime2 on ARGO, we follow the moving-pictures tutorial and run the instructions in the cluster environment. The steps are best run from the /scratch space which is read/write on all nodes.
After logging in, change to /scratch
cd $SCRATCH
module load miniconda3
If this is the first time you're using the conda environments in your shell, you need to first run
conda init
source ~/.bashrc
which will configure your shell for the conda environments. Your command prompt will now be prefixed with
(base)
to show the conda environments is active.
To see the available conda environments, run
conda env list
Change to the qiime2 environment with
conda activate qiime2-2020.11
Now we follow the steps in the tutorial to download and organize the data files needed
- Create the directory for the tutorial and move into it:
mkdir qiime2-moving-pictures-tutorial
cd qiime2-moving-pictures-tutorial
wget \
-O "sample-metadata.tsv" \
"https://data.qiime2.org/2020.11/tutorials/moving-pictures/sample_metadata.tsv"
mkdir emp-single-end-sequences
wget \
-O "emp-single-end-sequences/barcodes.fastq.gz" \
"https://data.qiime2.org/2020.11/tutorials/moving-pictures/emp-single-end-sequences/barcodes.fastq.gz"
wget \
-O "emp-single-end-sequences/sequences.fastq.gz" \
"https://data.qiime2.org/2020.11/tutorials/moving-pictures/emp-single-end-sequences/sequences.fastq.gz"
Running Qiime2
At this point, we are ready to run the qiime2 commands. There are 2 ways this can be done on the cluster.
Running qiime2 in an interactive session:
- Request a compute node using salloc
salloc
module purge
module load miniconda3
conda activate qiime2-2020.11
TMPDIR
export TMPDIR=/tmp
qiime tools import \
--type EMPSingleEndSequences \
--input-path emp-single-end-sequences \
--output-path emp-single-end-sequences.qza
Imported emp-single-end-sequences as EMPSingleEndDirFmt to emp-single-end-sequences.qza
You can keep running the subsequent commands one after the other and the interactive session will persist until you type
exit
Writing a slurm script and submitting a qiime2 job
The interactive method is useful for checking your work and running fewer commands for smaller sample sizes. We can also write slurm scripts that can be submitted through slurm to do the same. For the same command that was run in the interactive session, we can create a slurm script, qiime_job.slurm:
#!/bin/sh
#SBATCH --job-name=qiime2_tutorial
#SBATCH --partition=all-HiPri
## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID
#SBATCH --output=/scratch/%u/qiime2-moving-pictures-tutorial/%x-%N-%j.out # Output file
#SBATCH --error=/scratch/%u/qiime2-moving-pictures-tutorial/%x-%N-%j.err # Error file
#SBATCH --mem=5G # Total memory needed per task (units: K,M,G,T)
##SBATCH --time=<D-HH:MM> # Total time needed for job: Days-Hours:Minutes
## ----- Parallel Threads -----
## Some programs and libraries (OpenMP) implement parallelism using threads
## (light-weight sub-processes). Advantages: Less processing overhead and
## ability to share memory. Disadvantages: All threads must run on the same
## node. Make sure that the resources you request are feasible,
## e.g. --cpus-per-task must be <= # of cores on a node.
##SBATCH --cpus-per-task <C> # Request extra CPUs for threads
##SBATCH --mem-per-cpu <M> # If your threads us a lot of memory, and you
# plan to vary # of threads, use this not --mem=
##SBATCH --reservation=ssilayi_107 #uncomment this line if you're running before 2021-03-03T23:59:00
#otherwise you can delete the line
## Load the relevant modules needed for the job
# - Using the conda environment for qiime2
#module purge
#module load qiime2/2019.10
export TMPDIR=/tmp/
## Run your program or script
qiime tools import \
--type EMPSingleEndSequences \
--input-path emp-single-end-sequences \
--output-path emp-single-end-sequences.qza
In the script, we direct the output and error files to the directory in which we are currently working.
We are also submitting to the all-HiPri
queue which has a time limit of 12 hrs - more than sufficient
for the importing command to complete given the sample data size.
To submit it
sbatch qiime_job.slurm
sacct -X
.out
file should have the same line as from the interactive session
Imported emp-single-end-sequences as EMPSingleEndDirFmt to emp-single-end-sequences.qza
In both cases you should now have an additional emp-single-end-sequences.qza
file in your directory.
Writing batch scripts
Instead of running the qiime2 commands one at a time, we can run a sequence of commands by creating a shell script. For example, we take a series of commands from tutorial and add them to the shell script, qiime.sh
# importing data
echo 'Importing Data ...'
qiime tools import \
--type EMPSingleEndSequences \
--input-path emp-single-end-sequences \
--output-path emp-single-end-sequences.qza
# check the UUID, type, and format of your newly-imported sequences
echo 'UUID, type, and format imported sequences ...'
qiime tools peek emp-single-end-sequences.qza
# Demultiplexing sequences
echo 'Demultiplexing sequences ...'
qiime demux emp-single \
--i-seqs emp-single-end-sequences.qza \
--m-barcodes-file sample-metadata.tsv \
--m-barcodes-column barcode-sequence \
--o-per-sample-sequences demux.qza \
--o-error-correction-details demux-details.qza
# generate a summary of the demultiplexing results
echo 'Generating summary ...'
qiime demux summarize \
--i-data demux.qza \
--o-visualization demux.qzv
# Sequence quality control and feature table construction
echo 'Sequence quality control and feature table construction ... '
qiime dada2 denoise-single \
--i-demultiplexed-seqs demux.qza \
--p-trim-left 0 \
--p-trunc-len 120 \
--o-representative-sequences rep-seqs-dada2.qza \
--o-table table-dada2.qza \
--o-denoising-stats stats-dada2.qza
qiime metadata tabulate \
--m-input-file stats-dada2.qza \
--o-visualization stats-dada2.qzv
echo 'Renaming outputs ... '
mv rep-seqs-dada2.qza rep-seqs.qza
mv table-dada2.qza table.qza
chmod +x qiime.sh
Then we update the slurm script to now run the batch script with the commands in it
#!/bin/sh
#SBATCH --job-name=qiime2_moving_pictures
#SBATCH --partition=all-HiPri
## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID
#SBATCH --output=/scratch/%u/qiime2-moving-pictures-tutorial/%x-%N-%j.out # Output file
#SBATCH --error=/scratch/%u/qiime2-moving-pictures-tutorial/%x-%N-%j.err # Error file
#SBATCH --mem=5G # Total memory needed per task (units: K,M,G,T)
##SBATCH --time=<D-HH:MM> # Total time needed for job: Days-Hours:Minutes
## Load the relevant modules needed for the job
#module purge
#module load miniconda3
#conda activate qiime2-2020.11
#conda init
#source ~/.bashrc
export TMPDIR=/tmp/
## Run your program or script
## Replaced the qiime commands with the executable script
./qiime.sh
sbatch qiime_shell_job.slurm
sacct -X
Parallel Qiime Jobs
Some QIIME commands can utilize multiple threads with --p-n-threads
. Following the example of the script above,
we can write a new script, qiime_threads.sh
, that includes qiime2 commands with threading taken from the Atacama soil microbiome tutorial:
#!/bin/bash
mkdir -p atacama-tutorial
cd atacama-tutorial
wget \
-O "sample-metadata.tsv" \
"https://data.qiime2.org/2020.11/tutorials/atacama-soils/sample_metadata.tsv"
mkdir -p emp-paired-end-sequences
wget \
-O "emp-paired-end-sequences/forward.fastq.gz" \
"https://data.qiime2.org/2020.11/tutorials/atacama-soils/10p/forward.fastq.gz"
wget \
-O "emp-paired-end-sequences/reverse.fastq.gz" \
"https://data.qiime2.org/2020.11/tutorials/atacama-soils/10p/reverse.fastq.gz"
wget \
-O "emp-paired-end-sequences/barcodes.fastq.gz" \
"https://data.qiime2.org/2020.11/tutorials/atacama-soils/10p/barcodes.fastq.gz"
echo "Paired-end read analysis commands ... "
qiime tools import \
--type EMPPairedEndSequences \
--input-path emp-paired-end-sequences \
--output-path emp-paired-end-sequences.qza
qiime demux emp-paired \
--verbose \
--m-barcodes-file sample-metadata.tsv \
--m-barcodes-category BarcodeSequence \
--i-seqs emp-paired-end-sequences.qza \
--o-per-sample-sequences demux \
--p-rev-comp-mapping-barcodes
qiime demux summarize \
--verbose \
--i-data demux.qza \
--o-visualization demux.qzv
qiime dada2 denoise-paired \
--verbose \
--p-n-threads $SLURM_CPUS_PER_TASK \
--i-demultiplexed-seqs demux.qza \
--o-table table \
--o-representative-sequences rep-seqs \
--p-trim-left-f 13 \
--p-trim-left-r 13 \
--p-trunc-len-f 150 \
--p-trunc-len-r 150
qiime feature-table summarize \
--verbose \
--i-table table.qza \
--o-visualization table.qzv \
--m-sample-metadata-file sample-metadata.tsv
qiime feature-table tabulate-seqs \
--verbose \
--i-data rep-seqs.qza \
--o-visualization rep-seqs.qzv
The script combines bash commands to download the necessary files and the qiime commands to be run. It also uses the number of cpus-per-task that we define in the submission script with the updated slurm commands for multi-processing:
#!/bin/sh
#SBATCH --job-name=qiime2_w_threads
#SBATCH --partition=all-HiPri
## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID
#SBATCH --output=/scratch/%u/qiime2_tutorial/moving-pictures/%x-%N-%j.out # Output file
#SBATCH --error=/scratch/%u/qiime2_tutorial/moving-pictures/%x-%N-%j.err # Error file
#SBATCH --mem=5G # Total memory needed per task (units: K,M,G,T)
##SBATCH --time=<D-HH:MM> # Total time needed for job: Days-Hours:Minutes
## ----- Parallel Threads -----
## Some programs and libraries (OpenMP) implement parallelism using threads
## (light-weight sub-processes). Advantages: Less processing overhead and
## ability to share memory. Disadvantages: All threads must run on the same
## node. Make sure that the resources you request are feasible,
## e.g. --cpus-per-task must be <= # of cores on a node.
#SBATCH --nnodes 1
#SBATCH --cpus-per-task 16 # Request extra CPUs for threads
##SBATCH --mem-per-cpu <M> # If your threads us a lot of memory, and you
# plan to vary # of threads, use this not --mem=
## Load the relevant modules needed for the job
#module purge
#module load qiime2/2019.10
export TMPDIR=/tmp/
export OMP_NUM_THREADS=16
## Run your program or script
./qiime_threads.sh
To view the generated files in all cases, you can transfer the .qzv
files to your local machine and use the QIIME2 viewer.
When finished, you can deactivate the conda environments with
conda deactivate