DiffDock is an innovative molecular docking tool that uses diffusion models to predict protein-ligand interactions. Unlike traditional docking methods, DiffDock uses a generative AI approach to predict the binding of small molecules to proteins with high accuracy and speed. This model has shown great promise for drug discovery and virtual high throughput screening.
For a deeper understanding, explore the original paper 👉 "DiffDock: Diffusion steps, twists and turns for molecular docking".
A key challenge with DiffDock is that the default rbgcsail/diffdock:latest Docker image pulls model checkpoints from Hugging Face or other remote sources at runtime.
This causes significant delays when running inferences, as model downloads occur each time a container is spun up.
To resolve this, we can pre-download the models and bundle them into a custom Docker image. This approach offers several advantages:
Let’s walk through the process step by step.
Before building a custom image, we need to determine:
docker run --rm -it rbgcsail/diffdock:latest bash
ls -lh ~/.cache/huggingface
ls -lh ~/.torch/hub/checkpoints
In the case of DiffDock, the models typically reside in the following locations:
./hub/checkpoints/ (for ESM models from Hugging Face)
./workdir/v1.1/score_model/./workdir/v1.1/confidence_model/
Take note of these files, as they will be manually downloaded and included in the custom image.
From a test run, we identified the following key models to cache:
Confidence Model (confidence_model):
best_model_epoch75.ptmodel_parameters.ymlScore Model (score_model):
best_ema_inference_epoch_model.ptmodel_parameters.ymlESM Models (in hub/checkpoints):
esm2_t33_650M_UR50D.ptesm2_t33_650M_UR50D-contact-regression.ptesm2_t36_3B_UR50D.ptesm2_t36_3B_UR50D-contact-regression.ptesmfold_3B_v1.ptYou will organize these models locally as follows:
.
├── confidence_model/
├── score_model/
└── esm_models/
These will later be copied into the container.
In this step, we'll create a Dockerfile that:
rbgcsail/diffdock:latest imageAs shown below, here's an example of the Dockerfile:
# Use the base DiffDock image
FROM rbgcsail/diffdock:latest
WORKDIR /home/appuser/
# Create model directories
RUN mkdir -p ./workdir/v1.1/confidence_model \
./workdir/v1.1/score_model \
./hub/checkpoints
# Copy model checkpoints and assign ownership directly
COPY --chown=appuser:appuser confidence_model/ ./workdir/v1.1/confidence_model/
COPY --chown=appuser:appuser score_model/ ./workdir/v1.1/score_model/
COPY --chown=appuser:appuser esm_models/ ./hub/checkpoints/
# Set environment variable to avoid external downloads
# The TORCH_HOME will look for the hub/checkpoints directory
# and the ESM models will be found there
ENV TORCH_HOME=/home/appuser/
CMD ["bash"]
Once your models and Docker file are in the same directory, build the image:
docker build -t diffdock-with-models .
This can take a few minutes, depending on your system and the size of your models. Feel free to grab a coffee while you wait! ☕