Parallelization Benchmarks

Explore FDS performance benchmarks on the Inductiva.AI Cloud HPC platform. Compare scalability and cost efficiency across configurations.

Fire Dynamics Simulator (FDS) simulations benefit greatly from parallelization. FDS supports two parallel computing methods: MPI (Message Passing Interface) and OpenMP.

These benchmarks explore the performance scaling of FDS simulations using both MPI and OpenMP. All simulations were run via the Inductiva API on Google Cloud Platform (GCP) using c4-standard machines with hyperthreading enabled.

The "Time to Beat" column in the MPI benchmark table shows reference runtimes obtained from the official FDS repository, serving as a baseline to compare against the Inductiva cloud-based simulations.

MPI Benchmark

To demonstrate the impact of MPI-based parallelization, we replicated the MPI Strong Scaling benchmark, designed to measure how effectively simulation time decreases as more MPI processes are used.

The folder FDS_Input_Files contains simple input cases running for 100 time step. Each case uses a different number of meshes:

  • N=001 → single mesh, run with 1 MPI process (file strong_scaling_test_001.fds)
  • N=008 → 8 meshes, run with 8 MPI processes (file strong_scaling_test_008.fds)
  • N=016 → 16 meshes, run with 16 MPI processes (file strong_scaling_test_016.fds)

The total number of grid cells is kept constant across all cases. Ideally, increasing the number of MPI processes (and hence meshes) should reduce the simulation runtime.

Each simulation was run three times, averaging both runtime and cost. The c4-standard machine with least vCPUs capable of fitting each simulation was selected.

Below are the results for each problem size. The rightmost column shows the corresponding machine cost.

Machine TypeMPI SlotsMPI ProcessesAvg Time (s)Avg Cost ($)Time to beat (s)
c4-standard-2211360.490.0441399.00
c4-standard-888332.640.043192.10
c4-standard-323232116.800.06362.64
c4-standard-96966467.040.11741.54
c4-standard-96969657.750.10424.63
c4-standard-19219219237.410.16014.42
c4-standard-28828828826.390.1679.80

As expected, simulation time decreases as the number of MPI processes increases, demonstrating effective scaling performance. Our results compare favorably against the FDS baseline in most configurations.

OpenMP Benchmark

To demonstrate the effect of OpenMP parallelization, we ran the 8-mesh MPI case with an increasing number of OpenMP threads, while keeping the number of MPI processes fixed at 8. Each case was run on an appropriately sized c4-standard machine.

MPI ProcessesN OMP ThreadsMachine TypeAvg Time (s)Avg Cost ($)
81c4-standard-8322.600.042
82c4-standard-16192.860.051
84c4-standard-32139.780.074
86c4-standard-48123.160.099
812c4-standard-9695.440.156
824c4-standard-19284.540.293

Increasing the number of OpenMP threads results in reduced simulation time, showcasing the benefits of combining MPI and OpenMP in hybrid parallelization setups. However, cost tends to increase with larger machine sizes, highlighting a trade-off between time and expense.