Running QIIME2 on HOPPER IN A CONDA ENVIRONMENT:

Installing QIIME 2 Natively with Conda

To install QIIME2 on Hopper in your /home directory, follow the following steps:

Connect to Hopper nad set up you environment:
```
module load gnu10 openmpi
```

Get the miniconda install script:

curl -sL   "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" >   "Miniconda3.sh"

Install miniconda in your /home:

 bash Miniconda3.sh -b -p $PWD/miniconda3

Activate the base miniconda environment:
```
source miniconda3/bin/activate
```

Get the qiime conda environment file:

wget https://data.qiime2.org/distro/amplicon/qiime2-amplicon-2023.9-py38-linux-conda.yml

Create the qimme environment:

conda env create -n qiime2-amplicon-2023.9 --file qiime2-amplicon-2023.9-py38-linux-conda.yml

Activate the created environment:
```
conda activate qiime2-amplicon-2023.9
```

Install the additional packages:

conda install q2-picrust2=2023.2 -c conda-forge -c bioconda -c gavinmdouglas

You can now run qiime commands:

qiime --info

Usage: qiime [OPTIONS] COMMAND [ARGS]...

  QIIME 2 command-line interface (q2cli)
  --------------------------------------

  To get help with QIIME 2, visit https://qiime2.org.

  To enable tab completion in Bash, run the following command or add it to
  your .bashrc/.bash_profile:

      source tab-qiime

  To enable tab completion in ZSH, run the following commands or add them to
  your .zshrc:

       autoload -Uz compinit && compinit
       autoload bashcompinit && bashcompinit
       source tab-qiime

  Options:
  --version   Show the version and exit.
  --help      Show this message and exit.

 Commands:
  info                Display information about current deployment.
  tools               Tools for working with QIIME 2 files.
  dev                 Utilities for developers and advanced users.
  alignment           Plugin for generating and manipulating alignments.
  composition         Plugin for compositional data analysis.
  cutadapt            Plugin for removing adapter sequences, primers, and
                      other unwanted sequence from sequence data.
  dada2               Plugin for sequence quality control with DADA2.
  deblur              Plugin for sequence quality control with Deblur.
  demux               Plugin for demultiplexing & viewing sequence quality.
  diversity           Plugin for exploring community diversity.
  diversity-lib       Plugin for computing community diversity.
  emperor             Plugin for ordination plotting with Emperor.
  feature-classifier  Plugin for taxonomic classification.
  feature-table       Plugin for working with sample by feature tables.
  fragment-insertion  Plugin for extending phylogenies.
  longitudinal        Plugin for paired sample and time series analyses.
  metadata            Plugin for working with Metadata.
  phylogeny           Plugin for generating and manipulating phylogenies.
  picrust2            Predicts gene families and pathways from 16S sequences.
  quality-control     Plugin for quality control of feature and sequence data.
  quality-filter      Plugin for PHRED-based filtering and trimming.
  sample-classifier   Plugin for machine learning prediction of sample
                      metadata.
  taxa                Plugin for working with feature taxonomy annotations.
  vsearch             Plugin for clustering and dereplicating with vsearch.

Once you're done, you can exit the conda environment with:
```
conda deactivate
```
You need to run the deactivate command an additional time to exit the base conda environment.

Using QIIME 2 In Jupyter Notebooks from Open OnDemand

To make the installed conda environment available as a kernel on the Jupyter Lab/Jupyter Notebook apps in Open OnDemand

From the command line, active your installed environemnt:

source miniconda3/bin/activate
conda activate qiime2-amplicon-2023.9

Install the qiime kernel:

conda install -c anaconda ipykernel

python -m ipykernel install --user --name=qiime2-amplicon-2023.9

You should now have 'qiime2-amplicon-2023.9' as one of the kernel options to run in in Jupyter Lab/Jupyter Notebooks.

Running QIIME2 in batch mode with Slurm

To demonstrate how to use qiime2 on the cluster, we follow the moving-pictures tutorial and run the instructions in the cluster environment. The example steps are best run from the /scratch space which is read/write on all nodes.

After logging in, change to /scratch

cd $SCRATCH

Then set up your environment

source ~/miniconda/bin/activate

Change to the qiime2 environment with

conda activate qiime2-amplicon-2024.2

Now we follow the steps in the tutorial to download and organize the data files needed

Create the directory for the tutorial and move into it:

mkdir qiime2-moving-pictures-tutorial
cd qiime2-moving-pictures-tutorial

* Download the sample metadata:

wget \
  -O "sample-metadata.tsv" \
  "https://data.qiime2.org/2024.2/tutorials/moving-pictures/sample_metadata.tsv"

* Create a new directory and download a sample set of the sequence reads into it:

mkdir emp-single-end-sequences

wget \
  -O "emp-single-end-sequences/barcodes.fastq.gz" \
  "https://data.qiime2.org/2024.2/tutorials/moving-pictures/emp-single-end-sequences/barcodes.fastq.gz"

wget \
  -O "emp-single-end-sequences/sequences.fastq.gz" \
  "https://data.qiime2.org/2024.2/tutorials/moving-pictures/emp-single-end-sequences/sequences.fastq.gz"

At this point, we are ready to run the qiime2 commands. There are 2 ways this can be done on the cluster.

Running qiime2 in an interactive session:

Request a compute node using salloc

salloc

- Set up your QIIME environment

source ~/miniconda/bin/activate
conda activate qiime2-amplicon-2024.2

- It is also necessary to redefine the environment variable TMPDIR

export TMPDIR=/tmp

- Run the qiime2 command to import the sequence data files into a QIIME 2 artifact

qiime tools import \
  --type EMPSingleEndSequences \
  --input-path emp-single-end-sequences \
  --output-path emp-single-end-sequences.qza

You should get an output printed out to screen

Imported emp-single-end-sequences as EMPSingleEndDirFmt to emp-single-end-sequences.qza

You can keep running the subsequent commands one after the other and the interactive session will persist until you type

exit

Writing a slurm script and submitting a qiime2 job

The interactive method is useful for checking your work and running fewer commands for smaller sample sizes. We can also write slurm scripts that can be submitted through slurm to do the same. For the same command that was run in the interactive session, we can create a slurm script, qiime_job.slurm:

#!/bin/sh

#SBATCH --job-name=qiime2_tutorial

#SBATCH --partition=normal

## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID
#SBATCH --output=%x-%N-%j.out  # Output file
#SBATCH --error=%x-%N-%j.err   # Error file

#SBATCH --mem=5G        # Total memory needed per task (units: K,M,G,T)
##SBATCH --time=<D-HH:MM>  # Total time needed for job: Days-Hours:Minutes

## ----- Parallel Threads -----
## Some programs and libraries (OpenMP) implement parallelism using threads
## (light-weight sub-processes). Advantages: Less processing overhead and
## ability to share memory.  Disadvantages: All threads must run on the same
## node.  Make sure that the resources you request are feasible,
## e.g. --cpus-per-task must be <= # of cores on a node.
##SBATCH --cpus-per-task <C>   # Request extra CPUs for threads
##SBATCH --mem-per-cpu <M>     # If your threads us a lot of memory, and you
                               # plan to vary # of threads, use this not --mem=


# - Using the conda environment for qiime2

source ~/miniconda/bin/activate
conda activate qiime2-amplicon-2024.2

export TMPDIR=/tmp/


## Run your program or script
qiime tools import \
  --type EMPSingleEndSequences \
  --input-path emp-single-end-sequences \
  --output-path emp-single-end-sequences.qza

In the script, we direct the output and error files to the directory in which we are currently working. We are also submitting to the all-HiPri queue which has a time limit of 12 hrs - more than sufficient for the importing command to complete given the sample data size.

To submit it

sbatch qiime_job.slurm

To check on the status of the submitted job, run

sacct -X

Once it completes, the generated .out file should have the same line as from the interactive session

Imported emp-single-end-sequences as EMPSingleEndDirFmt to emp-single-end-sequences.qza

In both cases you should now have an additional emp-single-end-sequences.qza file in your directory.

Writing batch scripts

Instead of running the qiime2 commands one at a time, we can run a sequence of commands by creating a shell script. For example, we take a series of commands from tutorial and add them to the shell script, qiime.sh

# importing data
echo 'Importing Data ...'

qiime tools import \
  --type EMPSingleEndSequences \
  --input-path emp-single-end-sequences \
  --output-path emp-single-end-sequences.qza


# check the UUID, type, and format of your newly-imported sequences
echo 'UUID, type, and format imported sequences ...'
qiime tools peek emp-single-end-sequences.qza


# Demultiplexing sequences

echo 'Demultiplexing sequences ...'

qiime demux emp-single \
  --i-seqs emp-single-end-sequences.qza \
  --m-barcodes-file sample-metadata.tsv \
  --m-barcodes-column barcode-sequence \
  --o-per-sample-sequences demux.qza \
  --o-error-correction-details demux-details.qza

# generate a summary of the demultiplexing results
echo 'Generating summary ...'
qiime demux summarize \
  --i-data demux.qza \
  --o-visualization demux.qzv

# Sequence quality control and feature table construction

echo 'Sequence quality control and feature table construction ... '

qiime dada2 denoise-single \
  --i-demultiplexed-seqs demux.qza \
  --p-trim-left 0 \
  --p-trunc-len 120 \
  --o-representative-sequences rep-seqs-dada2.qza \
  --o-table table-dada2.qza \
  --o-denoising-stats stats-dada2.qza

qiime metadata tabulate \
  --m-input-file stats-dada2.qza \
  --o-visualization stats-dada2.qzv

echo 'Renaming outputs ... '

mv rep-seqs-dada2.qza rep-seqs.qza
mv table-dada2.qza table.qza

First we make the shell script executable with

chmod +x qiime.sh

Then we update the slurm script to now run the batch script with the commands in it

#!/bin/sh

#SBATCH --job-name=qiime2_moving_pictures

#SBATCH --partition=normal

## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID
#SBATCH --output=%x-%N-%j.out  # Output file
#SBATCH --error=%x-%N-%j.err   # Error file

#SBATCH --mem=5G        # Total memory needed per task (units: K,M,G,T)
##SBATCH --time=<D-HH:MM>  # Total time needed for job: Days-Hours:Minutes


## Set up your environment

source ~/miniconda/bin/activate
conda activate qiime2-amplicon-2024.2
export TMPDIR=/tmp/


## Run your program or script
## Replaced the qiime commands with the executable script
./qiime.sh

Submit to the compute nodes with

sbatch qiime_shell_job.slurm

The shell script will print out periodic messages to show the progress of the commands. You can also check the status of the running job with

sacct -X

Parallel Qiime Jobs

Some QIIME commands can utilize multiple threads with --p-n-threads. Following the example of the script above, we can write a new script, qiime_threads.sh, that includes qiime2 commands with threading taken from the Atacama soil microbiome tutorial:

#!/bin/bash

mkdir -p atacama-tutorial
cd atacama-tutorial

wget \
  -O "sample-metadata.tsv" \
  "https://data.qiime2.org/2020.11/tutorials/atacama-soils/sample_metadata.tsv"


mkdir -p emp-paired-end-sequences

wget \
  -O "emp-paired-end-sequences/forward.fastq.gz" \
  "https://data.qiime2.org/2020.11/tutorials/atacama-soils/10p/forward.fastq.gz"

wget \
  -O "emp-paired-end-sequences/reverse.fastq.gz" \
  "https://data.qiime2.org/2020.11/tutorials/atacama-soils/10p/reverse.fastq.gz"


wget \
  -O "emp-paired-end-sequences/barcodes.fastq.gz" \
  "https://data.qiime2.org/2020.11/tutorials/atacama-soils/10p/barcodes.fastq.gz"


echo "Paired-end read analysis commands ... "

qiime tools import \
  --type EMPPairedEndSequences \
  --input-path emp-paired-end-sequences \
  --output-path emp-paired-end-sequences.qza

qiime demux emp-paired \
  --verbose \
  --m-barcodes-file sample-metadata.tsv \
  --m-barcodes-category BarcodeSequence \
  --i-seqs emp-paired-end-sequences.qza \
  --o-per-sample-sequences demux \
 --p-rev-comp-mapping-barcodes

qiime demux summarize \
  --verbose \
  --i-data demux.qza \
  --o-visualization demux.qzv

qiime dada2 denoise-paired \
  --verbose \
  --p-n-threads $SLURM_CPUS_PER_TASK \
  --i-demultiplexed-seqs demux.qza \
  --o-table table \
  --o-representative-sequences rep-seqs \
  --p-trim-left-f 13 \
  --p-trim-left-r 13 \
  --p-trunc-len-f 150 \
  --p-trunc-len-r 150

qiime feature-table summarize \
  --verbose \
  --i-table table.qza \
  --o-visualization table.qzv \
  --m-sample-metadata-file sample-metadata.tsv

qiime feature-table tabulate-seqs \
  --verbose \
  --i-data rep-seqs.qza \
  --o-visualization rep-seqs.qzv

The script combines bash commands to download the necessary files and the qiime commands to be run. It also uses the number of cpus-per-task that we define in the submission script with the updated slurm commands for multi-processing:

#!/bin/sh

#SBATCH --job-name=qiime2_w_threads

#SBATCH --partition=all-HiPri

## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID
#SBATCH --output=%x-%N-%j.out  # Output file
#SBATCH --error=%x-%N-%j.err   # Error file

#SBATCH --mem=5G        # Total memory needed per task (units: K,M,G,T)
##SBATCH --time=<D-HH:MM>  # Total time needed for job: Days-Hours:Minutes

## ----- Parallel Threads -----
## Some programs and libraries (OpenMP) implement parallelism using threads
## (light-weight sub-processes). Advantages: Less processing overhead and
## ability to share memory.  Disadvantages: All threads must run on the same
## node.  Make sure that the resources you request are feasible,
## e.g. --cpus-per-task must be <= # of cores on a node.
#SBATCH --nnodes 1
#SBATCH --cpus-per-task 16   # Request extra CPUs for threads
##SBATCH --mem-per-cpu <M>     # If your threads us a lot of memory, and you
                               # plan to vary # of threads, use this not --mem=


## Set up your environment

source ~/miniconda/bin/activate
conda activate qiime2-amplicon-2024.2
export TMPDIR=/tmp/


export TMPDIR=/tmp/
export OMP_NUM_THREADS=16

## Run your program or script
./qiime_threads.sh

To view the generated files in all cases, you can transfer the .qzv files to your local machine and use the QIIME2 viewer.

When finished, you can deactivate the conda environments with

conda deactivate