Running QIIME2 on HOPPER IN A CONDA ENVIRONMENT:
Installing QIIME 2 Natively with Conda
To install QIIME2 on Hopper in your /home directory, follow the following steps:
-
Connect to Hopper nad set up you environment:
module load gnu10 openmpi
-
Get the miniconda install script:
curl -sL "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > "Miniconda3.sh"
-
Install miniconda in your /home:
bash Miniconda3.sh -b -p $PWD/miniconda3
-
Activate the base miniconda environment:
source miniconda3/bin/activate
-
Get the qiime conda environment file:
wget https://data.qiime2.org/distro/amplicon/qiime2-amplicon-2023.9-py38-linux-conda.yml
-
Create the qimme environment:
conda env create -n qiime2-amplicon-2023.9 --file qiime2-amplicon-2023.9-py38-linux-conda.yml
-
Activate the created environment:
conda activate qiime2-amplicon-2023.9
-
Install the additional packages:
conda install q2-picrust2=2023.2 -c conda-forge -c bioconda -c gavinmdouglas
-
You can now run qiime commands:
qiime --info
Usage: qiime [OPTIONS] COMMAND [ARGS]... QIIME 2 command-line interface (q2cli) -------------------------------------- To get help with QIIME 2, visit https://qiime2.org. To enable tab completion in Bash, run the following command or add it to your .bashrc/.bash_profile: source tab-qiime To enable tab completion in ZSH, run the following commands or add them to your .zshrc: autoload -Uz compinit && compinit autoload bashcompinit && bashcompinit source tab-qiime Options: --version Show the version and exit. --help Show this message and exit. Commands: info Display information about current deployment. tools Tools for working with QIIME 2 files. dev Utilities for developers and advanced users. alignment Plugin for generating and manipulating alignments. composition Plugin for compositional data analysis. cutadapt Plugin for removing adapter sequences, primers, and other unwanted sequence from sequence data. dada2 Plugin for sequence quality control with DADA2. deblur Plugin for sequence quality control with Deblur. demux Plugin for demultiplexing & viewing sequence quality. diversity Plugin for exploring community diversity. diversity-lib Plugin for computing community diversity. emperor Plugin for ordination plotting with Emperor. feature-classifier Plugin for taxonomic classification. feature-table Plugin for working with sample by feature tables. fragment-insertion Plugin for extending phylogenies. longitudinal Plugin for paired sample and time series analyses. metadata Plugin for working with Metadata. phylogeny Plugin for generating and manipulating phylogenies. picrust2 Predicts gene families and pathways from 16S sequences. quality-control Plugin for quality control of feature and sequence data. quality-filter Plugin for PHRED-based filtering and trimming. sample-classifier Plugin for machine learning prediction of sample metadata. taxa Plugin for working with feature taxonomy annotations. vsearch Plugin for clustering and dereplicating with vsearch.
-
Once you're done, you can exit the conda environment with:
conda deactivate
You need to run the deactivate command an additional time to exit the base conda environment.
Using QIIME 2 In Jupyter Notebooks from Open OnDemand
To make the installed conda environment available as a kernel on the Jupyter Lab/Jupyter Notebook apps in Open OnDemand
-
From the command line, active your installed environemnt:
source miniconda3/bin/activate conda activate qiime2-amplicon-2023.9
-
Install the qiime kernel:
conda install -c anaconda ipykernel python -m ipykernel install --user --name=qiime2-amplicon-2023.9
You should now have 'qiime2-amplicon-2023.9' as one of the kernel options to run in in Jupyter Lab/Jupyter Notebooks.
Running QIIME2 in batch mode with Slurm
To demonstrate how to use qiime2 on the cluster, we follow the moving-pictures tutorial and run the instructions in the cluster environment. The example steps are best run from the /scratch space which is read/write on all nodes.
After logging in, change to /scratch
cd $SCRATCH
source ~/miniconda/bin/activate
Change to the qiime2 environment with
conda activate qiime2-amplicon-2024.2
Now we follow the steps in the tutorial to download and organize the data files needed
- Create the directory for the tutorial and move into it:
mkdir qiime2-moving-pictures-tutorial
cd qiime2-moving-pictures-tutorial
wget \
-O "sample-metadata.tsv" \
"https://data.qiime2.org/2024.2/tutorials/moving-pictures/sample_metadata.tsv"
mkdir emp-single-end-sequences
wget \
-O "emp-single-end-sequences/barcodes.fastq.gz" \
"https://data.qiime2.org/2024.2/tutorials/moving-pictures/emp-single-end-sequences/barcodes.fastq.gz"
wget \
-O "emp-single-end-sequences/sequences.fastq.gz" \
"https://data.qiime2.org/2024.2/tutorials/moving-pictures/emp-single-end-sequences/sequences.fastq.gz"
At this point, we are ready to run the qiime2 commands. There are 2 ways this can be done on the cluster.
Running qiime2 in an interactive session:
- Request a compute node using salloc
salloc
source ~/miniconda/bin/activate
conda activate qiime2-amplicon-2024.2
TMPDIR
export TMPDIR=/tmp
qiime tools import \
--type EMPSingleEndSequences \
--input-path emp-single-end-sequences \
--output-path emp-single-end-sequences.qza
Imported emp-single-end-sequences as EMPSingleEndDirFmt to emp-single-end-sequences.qza
You can keep running the subsequent commands one after the other and the interactive session will persist until you type
exit
Writing a slurm script and submitting a qiime2 job
The interactive method is useful for checking your work and running fewer commands for smaller sample sizes. We can also write slurm scripts that can be submitted through slurm to do the same. For the same command that was run in the interactive session, we can create a slurm script, qiime_job.slurm:
#!/bin/sh
#SBATCH --job-name=qiime2_tutorial
#SBATCH --partition=normal
## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID
#SBATCH --output=%x-%N-%j.out # Output file
#SBATCH --error=%x-%N-%j.err # Error file
#SBATCH --mem=5G # Total memory needed per task (units: K,M,G,T)
##SBATCH --time=<D-HH:MM> # Total time needed for job: Days-Hours:Minutes
## ----- Parallel Threads -----
## Some programs and libraries (OpenMP) implement parallelism using threads
## (light-weight sub-processes). Advantages: Less processing overhead and
## ability to share memory. Disadvantages: All threads must run on the same
## node. Make sure that the resources you request are feasible,
## e.g. --cpus-per-task must be <= # of cores on a node.
##SBATCH --cpus-per-task <C> # Request extra CPUs for threads
##SBATCH --mem-per-cpu <M> # If your threads us a lot of memory, and you
# plan to vary # of threads, use this not --mem=
# - Using the conda environment for qiime2
source ~/miniconda/bin/activate
conda activate qiime2-amplicon-2024.2
export TMPDIR=/tmp/
## Run your program or script
qiime tools import \
--type EMPSingleEndSequences \
--input-path emp-single-end-sequences \
--output-path emp-single-end-sequences.qza
In the script, we direct the output and error files to the directory in which we are currently working.
We are also submitting to the all-HiPri
queue which has a time limit of 12 hrs - more than sufficient
for the importing command to complete given the sample data size.
To submit it
sbatch qiime_job.slurm
sacct -X
.out
file should have the same line as from the interactive session
Imported emp-single-end-sequences as EMPSingleEndDirFmt to emp-single-end-sequences.qza
In both cases you should now have an additional emp-single-end-sequences.qza
file in your directory.
Writing batch scripts
Instead of running the qiime2 commands one at a time, we can run a sequence of commands by creating a shell script. For example, we take a series of commands from tutorial and add them to the shell script, qiime.sh
# importing data
echo 'Importing Data ...'
qiime tools import \
--type EMPSingleEndSequences \
--input-path emp-single-end-sequences \
--output-path emp-single-end-sequences.qza
# check the UUID, type, and format of your newly-imported sequences
echo 'UUID, type, and format imported sequences ...'
qiime tools peek emp-single-end-sequences.qza
# Demultiplexing sequences
echo 'Demultiplexing sequences ...'
qiime demux emp-single \
--i-seqs emp-single-end-sequences.qza \
--m-barcodes-file sample-metadata.tsv \
--m-barcodes-column barcode-sequence \
--o-per-sample-sequences demux.qza \
--o-error-correction-details demux-details.qza
# generate a summary of the demultiplexing results
echo 'Generating summary ...'
qiime demux summarize \
--i-data demux.qza \
--o-visualization demux.qzv
# Sequence quality control and feature table construction
echo 'Sequence quality control and feature table construction ... '
qiime dada2 denoise-single \
--i-demultiplexed-seqs demux.qza \
--p-trim-left 0 \
--p-trunc-len 120 \
--o-representative-sequences rep-seqs-dada2.qza \
--o-table table-dada2.qza \
--o-denoising-stats stats-dada2.qza
qiime metadata tabulate \
--m-input-file stats-dada2.qza \
--o-visualization stats-dada2.qzv
echo 'Renaming outputs ... '
mv rep-seqs-dada2.qza rep-seqs.qza
mv table-dada2.qza table.qza
chmod +x qiime.sh
Then we update the slurm script to now run the batch script with the commands in it
#!/bin/sh
#SBATCH --job-name=qiime2_moving_pictures
#SBATCH --partition=normal
## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID
#SBATCH --output=%x-%N-%j.out # Output file
#SBATCH --error=%x-%N-%j.err # Error file
#SBATCH --mem=5G # Total memory needed per task (units: K,M,G,T)
##SBATCH --time=<D-HH:MM> # Total time needed for job: Days-Hours:Minutes
## Set up your environment
source ~/miniconda/bin/activate
conda activate qiime2-amplicon-2024.2
export TMPDIR=/tmp/
## Run your program or script
## Replaced the qiime commands with the executable script
./qiime.sh
sbatch qiime_shell_job.slurm
sacct -X
Parallel Qiime Jobs
Some QIIME commands can utilize multiple threads with --p-n-threads
. Following the example of the script above,
we can write a new script, qiime_threads.sh
, that includes qiime2 commands with threading taken from the Atacama soil microbiome tutorial:
#!/bin/bash
mkdir -p atacama-tutorial
cd atacama-tutorial
wget \
-O "sample-metadata.tsv" \
"https://data.qiime2.org/2020.11/tutorials/atacama-soils/sample_metadata.tsv"
mkdir -p emp-paired-end-sequences
wget \
-O "emp-paired-end-sequences/forward.fastq.gz" \
"https://data.qiime2.org/2020.11/tutorials/atacama-soils/10p/forward.fastq.gz"
wget \
-O "emp-paired-end-sequences/reverse.fastq.gz" \
"https://data.qiime2.org/2020.11/tutorials/atacama-soils/10p/reverse.fastq.gz"
wget \
-O "emp-paired-end-sequences/barcodes.fastq.gz" \
"https://data.qiime2.org/2020.11/tutorials/atacama-soils/10p/barcodes.fastq.gz"
echo "Paired-end read analysis commands ... "
qiime tools import \
--type EMPPairedEndSequences \
--input-path emp-paired-end-sequences \
--output-path emp-paired-end-sequences.qza
qiime demux emp-paired \
--verbose \
--m-barcodes-file sample-metadata.tsv \
--m-barcodes-category BarcodeSequence \
--i-seqs emp-paired-end-sequences.qza \
--o-per-sample-sequences demux \
--p-rev-comp-mapping-barcodes
qiime demux summarize \
--verbose \
--i-data demux.qza \
--o-visualization demux.qzv
qiime dada2 denoise-paired \
--verbose \
--p-n-threads $SLURM_CPUS_PER_TASK \
--i-demultiplexed-seqs demux.qza \
--o-table table \
--o-representative-sequences rep-seqs \
--p-trim-left-f 13 \
--p-trim-left-r 13 \
--p-trunc-len-f 150 \
--p-trunc-len-r 150
qiime feature-table summarize \
--verbose \
--i-table table.qza \
--o-visualization table.qzv \
--m-sample-metadata-file sample-metadata.tsv
qiime feature-table tabulate-seqs \
--verbose \
--i-data rep-seqs.qza \
--o-visualization rep-seqs.qzv
The script combines bash commands to download the necessary files and the qiime commands to be run. It also uses the number of cpus-per-task that we define in the submission script with the updated slurm commands for multi-processing:
#!/bin/sh
#SBATCH --job-name=qiime2_w_threads
#SBATCH --partition=all-HiPri
## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID
#SBATCH --output=%x-%N-%j.out # Output file
#SBATCH --error=%x-%N-%j.err # Error file
#SBATCH --mem=5G # Total memory needed per task (units: K,M,G,T)
##SBATCH --time=<D-HH:MM> # Total time needed for job: Days-Hours:Minutes
## ----- Parallel Threads -----
## Some programs and libraries (OpenMP) implement parallelism using threads
## (light-weight sub-processes). Advantages: Less processing overhead and
## ability to share memory. Disadvantages: All threads must run on the same
## node. Make sure that the resources you request are feasible,
## e.g. --cpus-per-task must be <= # of cores on a node.
#SBATCH --nnodes 1
#SBATCH --cpus-per-task 16 # Request extra CPUs for threads
##SBATCH --mem-per-cpu <M> # If your threads us a lot of memory, and you
# plan to vary # of threads, use this not --mem=
## Set up your environment
source ~/miniconda/bin/activate
conda activate qiime2-amplicon-2024.2
export TMPDIR=/tmp/
export TMPDIR=/tmp/
export OMP_NUM_THREADS=16
## Run your program or script
./qiime_threads.sh
To view the generated files in all cases, you can transfer the .qzv
files to your local machine and use the QIIME2 viewer.
When finished, you can deactivate the conda environments with
conda deactivate