AlphaFold

AlphaFold#

AlphaFold Overview#

AlphaFold is a program that predicts the three-dimensional structure of proteins from their amino acid sequences. AlphaFold 2 and AlphaFold 3 are available as modules on both Alpine and Blanca. For detailed instructions on running each version, please select the relevant tab below.

Load the default AlphaFold 2 module:

module load alphafold/2.3.1

View run options:

run_alphafold

AlphaFold 2 Module

Loading the AlphaFold 2 module does the following:

  • redirects temporary files from /tmp to /scratch/alpine/$USER

    • you can override this path by resetting TMPDIR after you load the module:

      module load alphafold/2.3.1
      export TMPDIR=<path/of/your/choosing>
      
  • activates the AlphaFold 2 conda environment

  • sets CURC_AF_DBS and CURC_AF_EXAMPLES environment variables (see “AlphaFold 2 Databases” and “AlphaFold 2 Examples” sections, below)

  • creates a shortcut to the AlphaFold 2 script so you can run the program with run_alphafold

AlphaFold 2 Databases

The AlphaFold 2 databases are located in /gpfs/alpine1/datasets/bioinformatics/alphafold. Note that this directory is not visible from a login node. Loading the AlphaFold 2 module stores this path in CURC_AF_DBS.

AlphaFold 2 Examples

Several example fasta files are located in /curc/sw/install/bio/alphafold/examples. Loading the AlphaFold 2 module stores this path in CURC_AF_EXAMPLES:

ls $CURC_AF_EXAMPLES
dummy.fasta  multimer.fa  rcsb_pdb_7DDD.fasta  T1050.fasta

Example Job Script

This example job script below is saved in /curc/sw/install/bio/alphafold/2.3.1. You can copy it to any space you have write permissions and make the desired changes:

cd /projects/$USER
cp /curc/sw/install/bio/alphafold/2.3.1/alphafold_alpine.sh .
#!/bin/bash

#SBATCH --nodes=1
#SBATCH --time=06:00:00
#SBATCH --partition=aa100
#SBATCH --qos=normal
#SBATCH --gres=gpu:1
#SBATCH --job-name=multimer_test
#SBATCH --output=multimer_test_%j.out
#SBATCH --ntasks=40
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<your email address>

module purge
module load alphafold/2.3.1

#change directory
cd /projects/$USER

#run AlphaFold
run_alphafold -d $CURC_AF_DBS -o . -f $CURC_AF_EXAMPLES/dummy.fasta -t 2020-05-14 -m "monomer" -g true

AlphaFold 3 has a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues. That neccessitates a different kind of input than the fasta input in AlphaFold 2.

On CURC’s Alpine system, AlphaFold 3 is available as a containerized module. It uses Apptainer/Singularity under the hood and is fully self-contained except for the separately downloaded model parameters (required).

AlphaFold 3 Module

Load AlphaFold 3 module:

module load alphafold/3.0.0

View run options:

run_alphafold --help

Loading the AlphaFold 3 module does the following:

  • sets environment variables used by the wrapper script:

    • AF3_IMAGE: Path to the AlphaFold 3 container image

    • AF3_CODE_DIR: Directory containing the AlphaFold 3 codebase

    • AF3_DATABASES_DIR: Location of the required AlphaFold 3 reference databases

  • redirects temporary files to /scratch/alpine/$USER

    • you can override this path by resetting TMPDIR after you load the module:

      module load alphafold/3.0.0
      export TMPDIR=<path/of/your/choosing>
      
  • creates a shortcut to the AlphaFold 3 script so you can run the program with run_alphafold

AlphaFold 3 Model Weights

Important

Due to license restrictions for AlphaFold 3 model weights, you must read and comply with the Model Parameters and Outputs Terms of Use. In short, only non-profit activity is allowed, unethical use of the outputs is disallowed and make sure to cite the AlphaFold 3 paper in any publication. To gain access to AlphaFold 3 at CURC, request access to the weights by filling out this form. You will receive two e-mails. First is acknowledgement of receipt of the request form. The second, in a day or so, is the approval with a link to download the weights. Once you have downloaded them, put them in a filesystem you have access to on Alpine. You will need to specify the path to the directory where you save the model weights using the --model_dir=<path to weights>.

AlphaFold 3 Input

AlphaFold 3 uses JSON input files instead of FASTA. You can either:

  • Provide a single JSON file via --json_path=<path of input>

  • Or a directory of JSONs via --input_dir=<path of input>

AlphaFold 3 Databases

Databases used by AlphaFold 3 are pre-installed and accessible via: /gpfs/alpine1/datasets/bioinformatics/alphafold3. Note that this directory is not visible from a login node. Loading the AlphaFold 3 module stores this path in AF3_DATABASES_DIR.

AlphaFold 3 Workflow

AlphaFold 3 runs in two stages:

Stage 1 (MSA Search): CPU and I/O-intensive; uses jackhmmer and hhmsearch.

Stage 2 (Inference): GPU-intensive; performs structure prediction.

To better utilize limited GPU resources, these stages can be split using flags:

  • --norun_inference → Run only the MSA/data pipeline (Stage 1)

  • --norun_data_pipeline → Run only the inference step (Stage 2)

AlphaFold 3 Examples

Example input files and scripts are in /curc/sw/install/bio/alphafold/3.0.0/examples. Loading the AlphaFold 3 module stores this path in AF3_EXAMPLES:

ls $AF3_EXAMPLES
alphafold3_alpine_cpu.sh  alphafold3_alpine_gpu.sh  alphafold3_alpine.sh  fold_protein_2PV7

This folder includes:

  • alphafold3_alpine.sh: Sample batch script to run the complete AlphaFold 3 pipeline.

  • alphafold3_alpine_cpu.sh: Sample batch script to run only Stage 1(MSA Search).

  • alphafold3_alpine_gpu.sh: Sample batch script to run only Stage 2 (Inference).

You can copy the examples folder to a location where you have write permissions and customize the scripts:

cd /projects/$USER
cp -R /curc/sw/install/bio/alphafold/3.0.0/examples .
cd examples

Example Job Script

Path of the script: $AF3_EXAMPLES/alphafold3_alpine.sh

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --time=30:00
#SBATCH --partition=al40
#SBATCH --qos=normal
#SBATCH --gres=gpu:1
#SBATCH --job-name=af3_test
#SBATCH --output=af3_test_%j.out
#SBATCH --ntasks=8
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<your email address>

# Load the AlphaFold 3 module
module purge
module load alphafold/3.0.0

# Set input JSON, output directory, and model parameter path
export INPUT_FILE=$AF3_EXAMPLES/fold_protein_2PV7/alphafold_input.json
export OUTPUT_DIR=/path/to/output
export AF3_MODEL_PARAMETERS_DIR=/path/to/alphafold3/params

# Run AlphaFold 3
run_alphafold --json_path=$INPUT_FILE --output_dir=$OUTPUT_DIR --model_dir=$AF3_MODEL_PARAMETERS_DIR