AlphaFold#
AlphaFold Overview#
AlphaFold is a program that predicts the three-dimensional structure of proteins from their amino acid sequences. AlphaFold 2 and AlphaFold 3 are available as modules on both Alpine and Blanca. For detailed instructions on running each version, please select the relevant tab below.
Load the default AlphaFold 2 module:
module load alphafold/2.3.1
View run options:
run_alphafold
AlphaFold 2 Module
Loading the AlphaFold 2 module does the following:
redirects temporary files from
/tmpto/scratch/alpine/$USERyou can override this path by resetting TMPDIR after you load the module:
module load alphafold/2.3.1 export TMPDIR=<path/of/your/choosing>
activates the AlphaFold 2 conda environment
sets
CURC_AF_DBSandCURC_AF_EXAMPLESenvironment variables (see “AlphaFold 2 Databases” and “AlphaFold 2 Examples” sections, below)creates a shortcut to the AlphaFold 2 script so you can run the program with
run_alphafold
AlphaFold 2 Databases
The AlphaFold 2 databases are located in /gpfs/alpine1/datasets/bioinformatics/alphafold.
Note that this directory is not visible from a login node. Loading the AlphaFold 2 module stores this path in CURC_AF_DBS.
AlphaFold 2 Examples
Several example fasta files are located in /curc/sw/install/bio/alphafold/examples.
Loading the AlphaFold 2 module stores this path in CURC_AF_EXAMPLES:
ls $CURC_AF_EXAMPLES
dummy.fasta multimer.fa rcsb_pdb_7DDD.fasta T1050.fasta
Example Job Script
This example job script below is saved in /curc/sw/install/bio/alphafold/2.3.1. You can copy it to any space you have write permissions and make the desired changes:
cd /projects/$USER
cp /curc/sw/install/bio/alphafold/2.3.1/alphafold_alpine.sh .
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --time=06:00:00
#SBATCH --partition=aa100
#SBATCH --qos=normal
#SBATCH --gres=gpu:1
#SBATCH --job-name=multimer_test
#SBATCH --output=multimer_test_%j.out
#SBATCH --ntasks=40
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<your email address>
module purge
module load alphafold/2.3.1
#change directory
cd /projects/$USER
#run AlphaFold
run_alphafold -d $CURC_AF_DBS -o . -f $CURC_AF_EXAMPLES/dummy.fasta -t 2020-05-14 -m "monomer" -g true
AlphaFold 3 has a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues. That neccessitates a different kind of input than the fasta input in AlphaFold 2.
On CURC’s Alpine system, AlphaFold 3 is available as a containerized module. It uses Apptainer/Singularity under the hood and is fully self-contained except for the separately downloaded model parameters (required).
AlphaFold 3 Module
Load AlphaFold 3 module:
module load alphafold/3.0.0
View run options:
run_alphafold --help
Loading the AlphaFold 3 module does the following:
sets environment variables used by the wrapper script:
AF3_IMAGE: Path to the AlphaFold 3 container imageAF3_CODE_DIR: Directory containing the AlphaFold 3 codebaseAF3_DATABASES_DIR: Location of the required AlphaFold 3 reference databases
redirects temporary files to
/scratch/alpine/$USERyou can override this path by resetting TMPDIR after you load the module:
module load alphafold/3.0.0 export TMPDIR=<path/of/your/choosing>
creates a shortcut to the AlphaFold 3 script so you can run the program with
run_alphafold
AlphaFold 3 Model Weights
Important
Due to license restrictions for AlphaFold 3 model weights, you must read and comply with the Model Parameters and Outputs Terms of Use. In short, only non-profit activity is allowed, unethical use of the outputs is disallowed and make sure to cite the AlphaFold 3 paper in any publication. To gain access to AlphaFold 3 at CURC, request access to the weights by filling out this form. You will receive two e-mails. First is acknowledgement of receipt of the request form. The second, in a day or so, is the approval with a link to download the weights. Once you have downloaded them, put them in a filesystem you have access to on Alpine.
You will need to specify the path to the directory where you save the model weights using the --model_dir=<path to weights>.
AlphaFold 3 Input
AlphaFold 3 uses JSON input files instead of FASTA. You can either:
Provide a single JSON file via
--json_path=<path of input>Or a directory of JSONs via
--input_dir=<path of input>
AlphaFold 3 Databases
Databases used by AlphaFold 3 are pre-installed and accessible via:
/gpfs/alpine1/datasets/bioinformatics/alphafold3. Note that this directory is not visible from a login node. Loading the AlphaFold 3 module stores this path in AF3_DATABASES_DIR.
AlphaFold 3 Workflow
AlphaFold 3 runs in two stages:
Stage 1 (MSA Search): CPU and I/O-intensive; uses jackhmmer and hhmsearch.
Stage 2 (Inference): GPU-intensive; performs structure prediction.
To better utilize limited GPU resources, these stages can be split using flags:
--norun_inference→ Run only the MSA/data pipeline (Stage 1)--norun_data_pipeline→ Run only the inference step (Stage 2)
AlphaFold 3 Examples
Example input files and scripts are in /curc/sw/install/bio/alphafold/3.0.0/examples.
Loading the AlphaFold 3 module stores this path in AF3_EXAMPLES:
ls $AF3_EXAMPLES
alphafold3_alpine_cpu.sh alphafold3_alpine_gpu.sh alphafold3_alpine.sh fold_protein_2PV7
This folder includes:
alphafold3_alpine.sh: Sample batch script to run the complete AlphaFold 3 pipeline.alphafold3_alpine_cpu.sh: Sample batch script to run only Stage 1(MSA Search).alphafold3_alpine_gpu.sh: Sample batch script to run only Stage 2 (Inference).
You can copy the examples folder to a location where you have write permissions and customize the scripts:
cd /projects/$USER
cp -R /curc/sw/install/bio/alphafold/3.0.0/examples .
cd examples
Example Job Script
Path of the script: $AF3_EXAMPLES/alphafold3_alpine.sh
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --time=30:00
#SBATCH --partition=al40
#SBATCH --qos=normal
#SBATCH --gres=gpu:1
#SBATCH --job-name=af3_test
#SBATCH --output=af3_test_%j.out
#SBATCH --ntasks=8
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<your email address>
# Load the AlphaFold 3 module
module purge
module load alphafold/3.0.0
# Set input JSON, output directory, and model parameter path
export INPUT_FILE=$AF3_EXAMPLES/fold_protein_2PV7/alphafold_input.json
export OUTPUT_DIR=/path/to/output
export AF3_MODEL_PARAMETERS_DIR=/path/to/alphafold3/params
# Run AlphaFold 3
run_alphafold --json_path=$INPUT_FILE --output_dir=$OUTPUT_DIR --model_dir=$AF3_MODEL_PARAMETERS_DIR