Running Alphafold3

Alphafold3 is a Nobel Prize winning machine learning tool designed to predict the 3d molecular structure of proteins from either the peptide sequence or the DNA sequence alone. Currently, Alphafold3 is available for use on Hopper as a Docker Singularity Container

Interactively

To get an Alphafold3 session started, first you must set up an interactive gpu session with salloc

salloc -p gpuq -q gpu --nodes=1 --ntasks-per-node=4 --gres=gpu:2g.20gb:1 --mem=15GB --time=0-02:00:00

Then you must load the compiler and singularity module and set up the Alphafold Singularity Environment

module load gnu10 
module load singularity
DB_DIR=/datasets/alphafold3/databases
MODEL_PARAMETERS_DIR=/datasets/alphafold3/model_parameters
AF3_SCRIPTS=/containers/dgx/Containers/alphafold3/v3.0.1

Copy python scripts into desired directory if not already done

cp $AF3_SCRIPTS/*.py .

set environment commands to run image

AF3_IMAGE=/containers/dgx/Containers/alphafold3/v3.0.1/singularity/alphafold3.sif

SINGULARITY_RUN="singularity exec --nv --bind $MODEL_PARAMETERS_DIR:/root/models --bind $DB_DIR:/root/public_databases --bind $AF3_SCRIPTS:/root/scripts"

Your input needs to be in a particular format within a json file. Here is an example for running a short amino acid sequence:

{
  "name": "2PV7",
  "sequences": [
    {
      "protein": {
        "id": ["A", "B"],
        "sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}

More in depth documentation of alphafold3 input structure can be found here

Run Alphafold

${SINGULARITY_RUN} ${AF3_IMAGE} python3 run_alphafold.py --json_path= name_of_the_file.json --output_dir= your_output_directory --db_dir=/root/public_databases --model_dir=/root/models

The prediction may take a fair amount of time to complete. The predicted fold can be found in the output folder you specified.

Through batch submission

The above steps can also be submitted in the form of a Slurm script

#!/bin/bash
#SBATCH --partition=gpuq
#SBATCH --qos=gpu
#SBATCH --job-name=af3_example
#SBATCH --output=af3.%j.out
#SBATCH --output=af3.%j.err
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --gres=gpu:2g.20gb:1
#SBATCH --mem=25GB

#SBATCH --time=0-01:00:00

module load gnu10 
module load singularity
DB_DIR=/datasets/alphafold3/databases
MODEL_PARAMETERS_DIR=/datasets/alphafold3/model_parameters
AF3_SCRIPTS=/containers/dgx/Containers/alphafold3/v3.0.1

cp $AF3_SCRIPTS/*.py .

AF3_IMAGE=/containers/dgx/Containers/alphafold3/v3.0.1/singularity/alphafold3.sif

SINGULARITY_RUN="singularity exec --nv --bind $MODEL_PARAMETERS_DIR:/root/models --bind $DB_DIR:/root/public_databases --bind $AF3_SCRIPTS:/root/scripts"

${SINGULARITY_RUN} ${AF3_IMAGE} python3 run_alphafold.py --json_path=fold_input.json --output_dir=your_output_directory --db_dir=/root/public_databases --model_dir=/root/models

Visualization

Visualization of the predicted fold needs to be done through third party software such as ChimeraX. The output of a fold will be in the form of a folder containing a .cif file. This file contains the data for visualization.

To open ChimeraX, set up a Virtual Desktop Session through Open OnDemand. Navigate to the terminal/command line application, then type the following

module load gnu9
module load chimeraX 
ChimeraX

More documentation on how to use ChimeraX on Hopper can be found here