# Flow #5: Testing Graph Neural Simulators

**Pedro Ferro Pereira**Main author

**David Carvalho**Co-author | Editor

**Ivan Pombo**Reviewer

**Fábio Cruz**Technical Reviewer

Quoting Etta James:

At Last!

In the previous posts of the *Flow* series, we outlined how a Deep Learning model can be built to learn fluid dynamics in simple (yet rich) physical scenarios.

In this post, it’s time to understand how the Encoder-Processor-Decoder (E-P-D) model introduced in the previous post fits in the simulation world. More importantly, we will see how a Graph Neural Simulator (GNS) relying on this model fares in comparison to unseen situations.

So, let’s flow!

# Flow #5:

Testing Graph Neural Simulators

To put our DL simulator to the test, we will focus on three different experiments:

- Train a model which can replicate the simulation dynamics of a
**single**instance of*the water cube*scenario. - Train a model on a dataset composed of
*several*instances of the water cube scenario to**generalise**to new, unseen, instances. - Use the best model found from the latter experiment to simulate instances from a
**different**scenario:*the dam break*.

## The water cube

scenario

The go-to scenario we will simulate first is the *water cube scenario*:

Video 1: Dynamics of a water cube scenario, simulated with \(5290\) SPH particles. Here, the block is left to fall with gravity - *splash!* Credits: Pedro Pereira / Inductiva

In this scenario, a cubic block of water with given *dimensions* is **dropped** inside a unit cube tank at an *initial position* and with an *initial velocity*. The block then moves exclusively under the effect of gravity — first by splashing onto the walls and floor and then by eventually halting at the bottom of the tank.

This scenario is simple and intuitive enough to model. More importantly, it still involves enough complexity for the DL model to learn with a handful of free parameters.

## Embracing

a single simulation

This first experiment is done before committing to a full-scale machine learning experiment — with train, validation and test sets — let us focus on *debugging* the proposed model.

We train the model to **overfit** a single simulation. We run a single simulation with SPH. We randomly initialize the initial position, initial velocity and cube dimensions. Then, we run the simulation throughout \(N_t = 496\) timeframes.

The goal is for the model to *replicate* this simulated data! If it does not achieve it, generalising the model may be a doomed task from the get-go!

### Message-passing

(how many times?)

For the model to learn the dynamics, the timeframe graphs \(\mathcal{G}^t\) need to be input.

Recall that the *processor* in our *E-P-D* DL model is a stack of \(M\) message-passing blocks that aggregate features coming from neighbouring nodes. Hence, in terms of performance, we expect that the more the merrier.

Well, let’s see how the model performs for different numbers of layers \(M\).

As a metric, we apply the mean average error (MAE) between the ground truth, the simulated acceleration \(\mathbf{a}^t\), and the model prediction \(\bar{\mathbf{a}}\). The MAE is the average over all timeframes, particles and acceleration components. Indeed, it shows an interesting behaviour:

Fig. 1: Effect of the number of message-passing steps on the model performance. A trained MLP with the same number of parameters as the model with \(M=10\) is used as a benchmark. Credits: Pedro Pereira / Inductiva

As expected, **increasing** the number of message-passing steps reduces the error **substantially**.
Looking at Fig. 1 we see that adding a single layer from \(M=1\) reduces the error by 66%!
Eventually, this behaviour *saturates* whenever feature information is given enough steps to cycle through the entire graph.

Let us be *cautious* though. The decrease in error for larger \(M\) may be due to more parameters present in the model and **not** due to this architecture feature.

To check this, we have also trained a multi-layer perceptron (MLP) with 2 hidden layers and 540 neurons per layer (virtually the same amount of parameters as the model with \(M=10\)) to serve as a *benchmark*.

Well, a single message-passing layer is *enough* for the benchmark to be outperformed, even though the model operates with 7 times more parameters!

Whenever \(M = 0\), the model does not have a Processor — and the input is propagated directly from the Encoder to the Decoder. More importantly, nodes do **not** *exchange* information with neighbours.

*Message passing* seems to provide a fast route to decrease this error metric.

But now the question is raised:

Did the model

actuallylearn the dynamics?

### A physics-friendly

analysis

Since the loss we are computing comes from the average of *all* acceleration components of all particles at all timeframes, it is hard to establish if the model is actually *learning* the physics at the particle level and not just minimizing the ensemble loss.

To verify the dynamics we instead track the acceleration error vector \(\epsilon^t_i = \left\| \mathbf{a}^t_i - \bar{\mathbf{a}}^t_i \right\|\) as a function of time, and see how it spreads over the assemble of particles:

Fig. 2: Acceleration error as a function of simulation time. The acceleration error \(\epsilon^t_i\) (top), the ground truth accelerations \(\left\| \mathbf{a}^t_i \right\|\) (middle) and the relative error \(\epsilon^t_i / \left\| \mathbf{a}^t_i \right\|\) (bottom), at each timeframe \(t\). Credits: Pedro Pereira / Inductiva

The error plots reveal the different stages of the water cube scenario. Unsurprisingly, the model struggles whenever more complex dynamics occur – when the fluid hits the ground.
The absolute error starts small for **all** particles when the water block is in free fall. Once particles start colliding with the tank wall the error increases. Then, once the particles halt at the bottom of the tank, the error again decreases. Yet, notice that in this stage the relative error still increases slightly. This is explained by a fast decrease in the ground truth acceleration at the end of the simulation.

A further notable analysis is to check if the model can also replicate the particle trajectories:

Fig. 3: Position error of particle trajectories as a function of time. The plot shows the position error \(\sum_i \left\| \bar{\mathbf{r}}^t_i \right\| /N\), averaged across all particles \(N\) as function of time \(t\) . Credits: Pedro Pereira / Inductiva

The answer seems to be negative since we are using a relatively small box and after \(1\)s the average distance between particles is \(0.3\)m. Yet, this is not demotivating since the model is meant to predict the next acceleration. With this in mind, we *flow* into further analysis.

#### Checking

for energy

Let’s keep focusing on the dynamic analysis. We can also verify if the mechanical energy of the system matches what is expected from the physical situation. For the water cube scenario, this is a straightforward quantity to compute: \(E_{\rm M} = \sum_i^n m_i \frac{1}{2} |\mathbf{v_i}|^2 + \sum_i^n m_i g z_i\) where \(v_i\) and \(z_i\) are the velocity norm and vertical coordinate of particle \(i\), respectively.

The model is fully aligned with the ground truth:

Fig. 4: The comparison of the average mechanical energy for both the ground truth and predicted simulations (top) shows a remarkably good agreement. This is further shown by plotting the relative error between the two (bottom), which is never seen to exceed \(6 \%\). Credits: Pedro Pereira / Inductiva

The plot of the relative error of the mechanical energy shows again that divergence starts at the moment of collision. Nevertheless, the mechanical energies are very similar throughout the simulation with the maximum error not exceeding \(6 \%\).

Even though it isn’t feasible scientific comparison, we can render the ground truth and predicted simulations together to see if we can find the differences:

Video 2: Comparison of rendered time frames from the ground truth and predicted simulation. The left shows particle positions for four-time frames of the ground truth simulation. The right shows the corresponding positions of the predicted simulation. Credits: Pedro Pereira / Inductiva

It becomes very hard to distinguish what is the one obtained from SPH and our model. This is great for some visual applications, like video games!

## Generalising

(within a water cube scenario)

The ability to *generalise* to other realistic scenarios is the second part of our experiment.

Now, we train the model over a distribution of \(10\) simulations. The initial position, velocity and dimensions of the water block in each simulation were chosen randomly within specified ranges to satisfy the distribution in Fig. 5.

The scatter plot in Fig. 5 shows how the train, validation and test set simulations are distributed in terms of initial velocity, in m/s, and size of the block, in terms of the number of particles. The simulation used in the overfit experiment is also plotted for comparison and test simulations are numbered to easily reference them.

Fig. 5: Distribution of dataset simulations. Models are trained on the 10 simulations in blue and validated on the 2 simulations in orange. The best model is tested on the 4 simulations in green, three of which are outside the training distribution. The single simulation overfitted in an experiment is also shown for comparison. Credits: Pedro Pereira / Inductiva

The goal of this experiment is to understand if by learning on the training set (in blue), the model can replicate simulations with a larger number of particles (test simulations 2 and 4) or more chaotic simulations (test simulations 3 and 4).

For this purpose, it is sufficient to mimic the previous analysis for the most general case, **test simulation 4**.

Fig. 6: Acceleration error as a function of simulation time for test case \(4\). the acceleration error (top), the ground truth accelerations (middle) and the relative error (bottom), at each timeframe \(t\). Credits: Pedro Pereira / Inductiva

The acceleration error is considerably high, especially in the initial time frames where the block collides with the wall more abruptly since the initial velocity was much higher. Although it is not particularly impressive by today’s machine learning standards to have predictions wrong by 50%, the model may still perform well in predicting the full fluid flow.

So, let’s move to mechanical energy analysis:

Fig. 7: The comparison of the mechanical energy between the ground truth and the predicted simulations (top) for test case 4. Credits: Pedro Pereira / Inductiva

Finally, Figure 7 shows the average mechanical energy for both the predicted and ground truth fluid flow. Note how, overall, the energy is higher since test simulation 4 had a much higher initial velocity. Although not as aligned as before, there is still a decent correspondence between the two lines, with the relative error never surpassing 6%, again! It is therefore plausible to conclude that the model was capable of generalising to this simulation.

To finish the generalisation analysis, we point again to a visual comparison that leads our eyes to set in equal foot the SPH and the GNS.

Fig. 8: Comparison of rendered time frames from the ground truth and predicted test simulation 4. Credits: Pedro Pereira / Inductiva

## Generalising

(to a dam break scenario)

### The dam break scenario

The dam break scenario is introduced here under the pretext that it is a common benchmark used to validate Computational Fluid Dynamics (CFD) codes. First, it is a quite simple free-surface flow to simulate, see Fig. 9.

Fig. 9: Dam break scenario. Credits: Pedro Pereira / Inductiva

Second, experimental data is available, published by Martin and Moyce [1], to validate how accurately the code simulates the measured flow.

To match it, we establish a virtual setting with the same set of parameters, *i.e.*, length of the box and dimensions of the dam break and ran a SPH simulation.

To validate the deep learning-based simulator, the same scenario setup was simulated by the GNS. We clarify that the E-P-D was not retrained on this scenario. Here, we use the same model trained on the dataset of 10 simulations of the water cube scenario presented above and used it to infer particle accelerations in this different setting.

Fig. 10: Dam break scenario validation. Comparison between experimental data, SPlisHSplasH’s simulation and the Graph Network Simulator’s prediction. Credits: Pedro Pereira / Inductiva

Results were normalised with respect to characteristic quantities of the domain, i.e. distance normalised to \(L\) (\(x \rightarrow x/L\)), and time normalised to \(\sqrt{2g/L}\) (\(t \rightarrow t{ 2g/L}\)). Presenting the results in a dimensionless manner makes them generic for arbitrary water column dimensions.

Validating the GNS with an established benchmark in CFD, strengthens its credibility as a proof of concept in the field of fluid simulation. Sanchez-Gonzalez et al. [2] (a DeepMind paper), who proposed the GNS, focus on validating the model on several of their scenarios, many of which are “toy” examples, but fail to assess its performance in real-life use cases. Here, their work is extended by performing a physics-based analysis, which includes the dam break scenario.

That’s all for now, folks!

The *flow* of this series was the subject of Pedro’s master thesis and we would like to thank him again for all the work we did at Inductiva. Thanks for everything Pedro Pereira!!

If Graph Neural Networks set you thinking about other possible applications, drop us a line at contact@inductiva.ai… and stay tuned for the future!

### 🌊🌊🌊 **At last… flow until we are back!** 🌊🌊🌊

## References

& Remarks

[1] J. C. Martin and W. J. Moyce, “Part iv. an experimental study of the collapse of liquid columns on a rigid horizontal plane,” Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences, vol. 244, no. 882, pp. 312–324, 1952.

[2] A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. Battaglia, “Learning to simulate complex physics with graph networks,” in International Conference on Machine Learning. PMLR, 2020, pp. 8459–8468

## Recent posts from our blog

Hugo Penedones

Luís Sarmento

The Inductiva API v0.4 release brings MPI clusters, the latest Google Cloud CPUs, two new simulators, a lighter Python package, a CLI interface, a template engine and totally revamped documentation. Get started in minutes!

Sofia Guerreiro

Cristiana Carpinteiro

In this series of blog posts we will explore a specific case of the use of AI in the pharmaceutical industry - using Graph Neural Networks for predicting binding affinity. But for now, let’s start by understanding the problem of drug discovery and some fundamental concepts like binding affinity.