Playing With Runtime Fluid Simulation

This post details my continuing relationship with runtime fluid sim and compute shaders, and sadly is so far incomplete. However, I want to be less precious about what I post on my blog this year, and embrace “failure” as much as “success”.

It’s frustrating to feel like I’m chasing “giants” in the field of computer graphics, and I do need to continually remind myself that my math and graphics programming parts of my journey are still relatively new and unpractised compared to some of the other aspects of my craft. Although if nothing else, writing this post, is helpful for my own understanding even if this does rot in the bowls of internet.

So enough feeling sorry for myself, lets go!

Haven’t you done this before?

My last ventures into this field of simulation did not amount to a “true” a fluid sim; there’s no notion of pressure, or (in)compressibility here. My previous attempt was a simpler mass-spring system which only really diffuses velocity across a 2D simulation space.

Gif showing my previous mass-spring system visualised with Unity gizmos and re-computed surface normals

A mass-spring system, is a relatively basic form of simulation where masses (representing particles or points in space) are connected by springs. This system primarily handles the diffusion of velocity across a 2D space, simulating basic interactions and movements.

While effective for simpler applications and computationally less demanding, it lacks the complexity of advanced fluid dynamics simulations. These more sophisticated simulations, which I aim to explore, involve more complex aspects such as pressure dynamics, incompressibility, and turbulent flow, offering a more realistic representation of fluid behaviour.

One of the factors which drives my continued curiosity, is the long-term goal of being able to simulate, or fake, wind simulations in game worlds. Although from a mass spring system you can infer flow from the rate of change across the simulation, it would not be suitable as it does not address pressure. As pressure makes wind, herein lies the motivation to look further into the rabbit hole.

Where to Begin?

Whilst not implementing “true” fluid simulations before, I’m familiar with some of the terminology and concepts as they’ve surfaced in Googles of days-past. The Navier-Stokes Equations are a corner stone of fluid simulation, there are also other mathematical terms such as The Jacobian which relate to solving real time fluid simulations.

To start from scratch however, one needs to be more then familiar with these terms, and whilst a complete understanding isn’t required, I’m not well versed enough in calculus – and other advanced mathematical concepts – to dive in with my own solution.

Me and My Neighborhood, by Watt Flanders

Wyatt Flanders has possibly one of the most succinct 2D fluid simulation implementations I’ve seen, both with a simple pdf explainer with accompanying shader toy example. This seems as good a place to begin as any.

ShaderToy vs Unity Implementation

As much as there’s an example out there, ShaderToy’s focused WebGL-powered renderer is quite different from Unity’s much larger and more generic game engine. There are a number of differences between the two which I’ll need to understand to complete the porting of this fluid simulation example.

I’m glazing over some aspects of differences in HLSL vs OpenGL however, as these are relatively minor.


My previous experience in Unity compute shader simulations has already taught me the limitation with temporal processing and how Unity dispatches and renders with the GPU. That is, you cannot reliably read from, and write to, the same data set at the same time.

From the previous post, this example shows the simulation breaking down and turning into noise when trying to render “in-place”. The algorithm for the above simulation requires each thread of the compute shader dispatch to sample from neighbouring pixels of it’s own simulation domain. The issue arises with Unity (and most modern renderers) is with their non deterministic processing order. So at a given point in a dispatch, there’s no guarantee a thread’s neighbours have been processed yet and thus updated their respective pixels in the render buffer.

This lack of continuity is an issue for any temporal processing, and is a reason why I’ll be using a double-buffered approach again with my conversion of the above.

This highlights one difference with Wyatt’s ShaderToy, it appears to not need to double-buffer. On some more searching, it appears that ShaderToy kind of double-buffers by virtue of presenting the previous frames buffer as an input to the current frames execution (source). This isn’t a big problem, but it’s a little more legwork I need to do, and another discrepancy I’ll need to account for when debugging.


Another implementation difference will be on the c# and “dispatch” side of things. ShaderToy handles continuous rendering and binding of data to the shader’s execution (as well as buffering mentioned above). However Unity only has a very generic framework for dispatching compute shaders, so I’ll begin with a very similar approach to my previous outings.

This approach is to have a C# behaviour which binds resources, updates parameters, and dispatches my compute shader every FixedUpdate() step. This ensures a consistent framerate is used until I make the simulation frame-rate independent. Once the compute shader has been dispatched, I swap the source and target render buffers as part of my double-buffered implementation.

Through this c# behaviour, I’ll also have an interface for interactively modifying shader uniform parameters in real time, providing mouse input for adding energy to the simulation, and outputting debug information.

Co-ordinate Space

A final point of differences here, is how texture buffers are sampled between the two target renderers. ShaderToy’s mainImage() function provides a pixel co-ordinate which needs to be transformed into texel space for the buffer reading and writing. To be more specific, this is the centre of the pixel, and thus the whole co ordinate is offset by 0.5, there’s a very good reason to do this.

However with the compute shader implementation, I’ll be dispatching a thread per pixel of my in/out buffers, so no co-ordinate transformation is required as I can use direct indexing of data using the thread ID.

Implementation Snags

This isn’t an attempt at a GameBoy emulator, simply a failed experiment. The exact reason for the failure escapes me now…

Given that the algorithm I’m following is really quite simple, and I’m not going to go into the math or the principles of compute-shader programming broadly, I’m mainly going to focus on the issue I’ve had so far. As I’ve already mentioned, I’ve not “solved” this yet in Unity, but I’ll make the git repo available when I do in case it’s of interest..

Render Target Type

I wasn’t organised to get specific screengrabs of the many permutations of issue I had here, but the moral of the story is, don’t use RenderTextureFormat.Default for this. With any degree of consideration this isn’t a ground-breaking discovery, but it’s worth understanding why this might be an issue.

Wyatt’s example encodes the following data into the output buffer;

  • XY is energy (or velocity)
  • Z is disordered energy
  • W is ordered (internal) energy

There are two main factors to consider when encoding the data above are, “does the data need to be signed (can the value go negative)?” and “what degree of precision is needed?”.

XY being effectively a direction vector means it our texture format will need to be signed. In some of my experiments, I had simulations biased and solving diagonally upwards because I couldn’t store values going the opposite direction in my texture = they were clamped at 0.

The issue of precision arises where certain RenderTextureFormats use different levels of precision on certain channels – the more pronounced when an alpha channel is 1-bit and this on/off… This breaks the simulation rather quickly.

A less subtle but still important issue here, is when the texture is limited to 8-bits per channel, it can’t exceed a 0-1 range. It’s possible with this implementation to use 8-bit values, using these vectors as normalised velocities, but for now its easier to keep things simple.

As for the Z and W components, I’ve not yet worked out what the ideal value range is for these, or if they ever need to exceed 0, they should never fall below 0 though.

For now, I’ve settled on RenderTextureFormat.ARGBFloat as it’s a safe format and more then enough for this experiment with 32-bits of precision per channel. If texture memory becomes an issue theoretically, there are likely alternate ways to encode this data in more memory-friendly layouts. But for now this isn’t an issue.

The Field() Function

The difference in buffer sampling in the Field() function has also tripped me up, lets take a quick look at it;

#define LOOKUP(COORD) texture(iChannel0,(COORD)/iResolution.xy)

vec4 Field (vec2 position) {
// Rule 1 : All My Energy translates with my ordered Energy
vec2 velocityGuess = LOOKUP (position).xy;
vec2 positionGuess = position - velocityGuess;
return LOOKUP (positionGuess);

The above snippet from Wyatt’s implementation shows the Field function in his ShaderToy example. This function performs an approximation of a backward Advection Step in this sim – in simpler English, this approach uses current velocity information to trace fluid elements backward in time, allowing the simulation to “move” these elements, without actually needing to move them. The link above explains this more thoroughly.

Failed experiment owing to unsigned texture usage, and incorrect advection look-up co-ordinates.

The first aspect of this which tripped me up was the delta between lookups was far too big, and thus the simulations would be unstable and unpredictable, and sometimes just “burn out” really fast. By applying a uniform _timeStep scalar to the positionGuess, and interactively changed this until things behaved, I managed to iron out this specific problem.

The second problematic aspect which I’m not 100% I have a good solution for, relates to how the data is discretised into a grid.

texture(iChannel0,(COORD)/iResolution.xy) is the ShaderToy’s texture sample. For a RWBuffer<float4> in a compute shader, we have to directly index the texture by ID.
_Lookup[uint2(coord.x, coord.y)] is what I have (where _Lookup is the input buffer). The key part here is casting a float2 position into a unit2 lookup index. This cast truncates each component, thus removing the fraction. This loss of fraction appears to bias the simulation in one direction.

To address this, I’ve rounded – round(positionGuess) – the sample of the position guess, so the cast to uint2 doesn’t lose velocity in our sim. The issue here however is that we likely gain velocity. I’ve yet to fully understand the implications of this, but this is the best I have for now.

Where Things Currently Are

I’ve not been particularly scientific in my documentation of progress. The following examples are for illustration, and cannot be used for accurate debugging as I don’t exactly recall the state of the simulation and algorithm at each point.

The above gif shows one of the more successful examples of the disordered energy of my simulation so far. It has a fluid feel to it, but it’s clearly not right. The fluid velocities seem massively biased to right (which I assume was the fractional loss on the advection in the Field() function), and some other general odd behaviours.

In an effort to debug things, I’ve stepped the resolution down to see if I can better trace the journey of the fluid through individual cells. Ideally this disordered energy should gradually diffuse into consistent shaders of grey, however I’m left with a chess-board-like final state. It seems like a rounding issue with my Field() function (since using round()…), but I’ve yet been able to confirm this to a satisfactory degree.

Thankfully with this low resolution, it becomes possible to print out data in a readable way. Above image is for demonstration purposes only as I’ve not yet been able to learn anything from this beyond the obvious (it’s not right).

At a high resolution, a view of my ordered energy; I can also get some some pretty impressive (/s) zebra stripes. These wild oscillations of energy are what I’ve taken to calling “discontinuities” in my simulation, and are part of my future plans for debugging.

Conclusion So Far

At this point, the code from Wyatt’s ShaderToy have largely been copied verbatim (save necessary changes), and I still don’t have as stable-a simulation as I’ve been copying. So at this point, after a handful of 2-hour stints, I’ve come to the conclusion I need a much better understanding of the problem as a whole.

This NASA page from the Glenn Research Centre seems to point at some more generic pitfalls of this type of simulation, and offers next steps in my journey. I’ve also earmarked this blog post from Shahriar Shahrabi as it seems like another accessible route into this study with a much more domain-appropriate explainer for fluid simulation.

But until then, I need to draw a line under this before returning to work from a 3-week break, in case I don’t return for a little while again.