AlphaFold Documentation
AlphaFold is an artificial intelligence program developed by Google DeepMind, a subsidiary of Alphabet, which predicts protein structures with high accuracy. This documentation provides instructions for running AlphaFold2 and AlphaFold3 on the Midway3 HPC system.
Available Modules
AlphaFold 2
AlphaFold 2 is available as modules on Midway3. You can check the available versions using the following command:
module avail alphafold
---------------------- /software/modulefiles----------------------------------
alphafold/2.0.0(default) alphafold/2.2.0 alphafold/2.3.2
The AlphaFold source code and running scripts (e.g., run_alphafold.py
) can be found on the AlphaFold GitHub repository.
The training datasets for different versions of AlphaFold are accessible under:
- /software/alphafold-data/
- /software/alphafold-data-2.2/
- /software/alphafold-data-2.3/
AlphaFold 3
AlphaFold 3 on Midway3 uses a container-based approach with Singularity (or Apptainer) and requires different input arguments compared to AlphaFold 2. It supports advanced features like FlashAttention and improved accuracy for protein-ligand and protein-DNA interactions. See the example job script below for usage details.
Key Differences Between AlphaFold 2 and AlphaFold 3
Feature | AlphaFold 2 | AlphaFold 3 |
---|---|---|
Input Format | .fasta file |
.json file |
Execution Environment | Python-based scripts | Singularity container |
GPU Requirements | Moderate GPU memory (e.g., V100) | High GPU memory (e.g., 2 A100 GPUs) |
FlashAttention Support | Not available | Supported (triton or xla ) |
System Requirements
AlphaFold 2
- GPU: V100 or higher (NVIDIA GPU with compute capability ≥8.0 recommended)
- CUDA: Version 11.3 or higher
- Memory: 32GB RAM minimum (64GB recommended for large proteins or multiple jobs)
AlphaFold 3
- GPU: A100 or higher recommended (80GB GPU RAM may be needed for very large inputs)
- CUDA: Version 12.3 or higher (CUDA 12.6 preferred for best accuracy; 12.2 may work, but not guaranteed)
- Memory: 32GB RAM minimum (more is better for large jobs and databases)
Why Container-Based Approach for AlphaFold 3?
AlphaFold 3 requires CUDA 12.3 or higher, but the system NVIDIA driver version is 12.2. By using a container, we bypassed the driver compatibility issue.
This guidebook will be updated with a module-based usage method for AlphaFold 3 in the future, as the system CUDA version is updated.
Example Job Scripts
AlphaFold 2 Job Script
The following example demonstrates how to run AlphaFold 2 on a GPU node with 2 GPUs and up to 16 CPU cores for multithreading on Midway3.
#!/bin/bash
#SBATCH --job-name=alphafold2
#SBATCH --account=[your-accountname]
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --time=04:00:00
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --gres=gpu:2
#SBATCH --constraint=v100
#SBATCH --mem=64G
module load alphafold/2.3.2 cuda/11.3
cd $SLURM_SUBMIT_DIR
DOWNLOAD_DATA_DIR=/software/alphafold-data-2.3
python run_alphafold.py \
--data_dir=$DOWNLOAD_DATA_DIR \
--uniref90_database_path=$DOWNLOAD_DATA_DIR/uniref90/uniref90.fasta \
--mgnify_database_path=$DOWNLOAD_DATA_DIR/mgnify/mgy_clusters_2022_05.fa \
--bfd_database_path=$DOWNLOAD_DATA_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniref30_database_path=$DOWNLOAD_DATA_DIR/uniref30/UniRef30_2021_03 \
--pdb70_database_path=$DOWNLOAD_DATA_DIR/pdb70/pdb70 \
--template_mmcif_dir=$DOWNLOAD_DATA_DIR/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=$DOWNLOAD_DATA_DIR/pdb_mmcif/obsolete.dat \
--model_preset=monomer \
--max_template_date=2022-1-1 \
--db_preset=full_dbs \
--use_gpu_relax=true \
--output_dir=out_alphafold_2.1.1_multi-monomer \
--fasta_paths=T1083.fasta,T1084.fasta
AlphaFold 3 Job Script
The following example demonstrates how to run AlphaFold 3 using a .json
input file (e.g., nipah_zmr.json
) on a GPU node with 2 A100 GPUs.
#!/bin/bash
#SBATCH --job-name=alphafold3
#SBATCH --account=[your-accountname]
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --time=04:00:00
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --gres=gpu:2
#SBATCH --constraint=a100
#SBATCH --mem=32G
module load apptainer
cd $SLURM_SUBMIT_DIR
# Set the path to the AlphaFold 3 database directory
DOWNLOAD_DATA_DIR=/software/alphafold3.0-el8-x86_64/databases # Path to AlphaFold 3 database directory
# Define bind paths
export BIND_PATHS="$DOWNLOAD_DATA_DIR,/software/alphafold3.0-el8-x86_64/params,/software/alphafold3.0-el8-x86_64/singularity,/tmp/$USER,/home/$USER,/scratch/midway3/$USER"
# Run the Singularity container
singularity exec --nv \
-B "$BIND_PATHS" \
--env CUDA_VISIBLE_DEVICES=0,1,NVIDIA_VISIBLE_DEVICES=0,1 \
/software/alphafold3.0-el8-x86_64/alphafold3.sif \
python /app/alphafold/run_alphafold.py \
--json_path=/home/$USER/nipah_zmr.json \
--db_dir=$DOWNLOAD_DATA_DIR \
--output_dir=/scratch/midway3/$USER/alphafold3_output \
--model_dir=/software/alphafold3.0-el8-x86_64/params \
--flash_attention_implementation=triton \
--run_data_pipeline=True \
--run_inference=True \
--jackhmmer_n_cpu=8 \
--nhmmer_n_cpu=8
Input File Preparation
AlphaFold 2
- Input format:
.fasta
file containing the protein sequence(s). - Example:
>T1083 SEQUENCE1 >T1084 SEQUENCE2
AlphaFold 3
- Input format:
.json
file containing the protein sequence(s) and metadata. - Example:
Download the example file
nipah_zmr.json
from this link and place it in your home directory (/home/$USER
).
Troubleshooting
Common Errors and Solutions
-
Error:
Unknown backend: 'gpu' requested, but no platforms that are instances of gpu are present.
- Solution: Ensure the job is running on a GPU-enabled node and that the
CUDA_VISIBLE_DEVICES
environment variable is set correctly.
- Solution: Ensure the job is running on a GPU-enabled node and that the
-
Error:
Failed to get mmCIF for <PDB_ID>.
- Solution: Verify that the database directory is accessible and contains the required files. Ensure proper permissions:
chmod 755 --recursive /software/alphafold3.0-el8-x86_64/databases
- Solution: Verify that the database directory is accessible and contains the required files. Ensure proper permissions:
-
Error:
implementation='triton' for FlashAttention is unsupported on this GPU generation.
- Solution: Switch to the
xla
implementation:--flash_attention_implementation=xla
- Solution: Switch to the
-
Error:
CUDA version mismatch.
- Solution: Ensure the NVIDIA driver and CUDA versions are compatible with AlphaFold3. Update the driver if necessary. For instance, if
nvidia-smi
on the node showsit means that the GPU driver version is 535.161.08, which was installed along with the CUDA toolkit version 12.2. If a version of AlphaFold3 relies on the CUDA toolkit 12.X (where X > 2), make sure that the GPU driver 535.161.08 is also compatible with this CUDA 12.x version.NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2
- Solution: Ensure the NVIDIA driver and CUDA versions are compatible with AlphaFold3. Update the driver if necessary. For instance, if
Additional Notes
- Resource Allocation: Adjust the
--gres=gpu
and--mem
parameters in the job script based on the size of the input data. - Output Directory: Results will be saved in the directory specified by the
--output_dir
parameter.
For more information, visit the AlphaFold GitHub repository or contact the RCC support team.