It’s supremely satisfying when a somewhat abstract and dry problem in Mathematics can be visualised in awestriking ways.

For these coming winter festivities 🎄, and to commemorate the 378th birthday of Sir Isaac Newton (who was born on the 25th of December of 1643), we bring you one such case.

We invite you to come gaze (and learn) about some beautiful Newton fractals.

For you to understand what these are about, you will need to grasp two easy concepts: that of a fractal and that of a root of a function.

Let’s start!

What are fractals?

For those of you who are not baby boomers or never marvelled at some Psychedelic Rock posters and artwork, maybe a little introduction is necessary.

Fractals are objects that can encapsulate a never-ending pattern (sometimes even repeating itself indefinitely) as we zoom in or out. This is usually achieved by repeating some action recursively forever. For this reason, fractals can have infinite detail – there just isn’t enough resolution to approximate them.

Fractal dimensions

The name “fractal” comes from the fact that these objects are described by a non-integer dimension \(d\). We expect “normal” spaces to scale by an integer.

What does this mean? Take the 2-dimensional plane \(\mathbb{R}^2\): for a length \(l\), the associated measure in this space is the area \(A=l^2\). In this perspective, the dimension of this space tells us how this measure (in this case the area) scales with respect to \(l\).

Its precisely this relationshsip that allows us to term this space 2-dimensional - namely by what the exponent is: \(d=2\). Analogously, \(\mathbb{R}^3\) has an associated measure the volume \(V = l^3\) and hence \(d=3\).

In more tangible terms, this fact implies that if, say, we double \(l\), the associated measure increase as \(2^d\). Thus, if we, say, double the side of a square, its area will increase \(2^2 = 4\)-fold and its volume \(2^3=8\)-fold.

Dimensions in fractal spaces are way less intuitive. Let’s have a look at a very famous fractal: the Sierpiński triangle. It is formed by, starting from the initial configuration of an equilateral triangle, finding the midpoints of its sides and, with them, inscribing a new equilateral triangle inside.

The evolution of this procedure for only 4 steps can be seen below. As this construction is recursively performed, more equilateral triangles are formed and it doesn’t take many steps until we have a very complex and rough output.

Fig. 1: The iteration which generates the Sierpinski triangle. Credits: Wikipedia

Using the previous scaling argument, we see that if we double the length of the base of the initial triangle, this construction will create a structure with now three times the area of the initial triangle.

Equivalently, this is to say that the dimension \(d\) of the fractal space must satisfy \(2^d = 3\), which implies \(d = \log(3)/\log(2) \approx 1.58496250072\) – an irrational number but most importantly a number between integers. Oh oh oh!

To converge or not to converge - that is the question!

For a structure to be “stable” as we apply these recursive rules, some notion of convergence needs to take place. What do we exactly mean by this?

Here, convergence means that the result of applying some rule gets closer to some value. For example, if our rule is \(x_{n+1} = (1/2) x_n\), we see that this will rapidly converge to 0 - start with a number \(x_0 = x\); with this, \(x_1 = (1/2) x_0\) and \(x_2 = (1/2)x_1 = (1/2)^2 x_0\).

In general \(x_n = (1/2)^n x_0\) and, in the limit of an infinity-large \(n\), we have \(\lim_{x \to 0} x_n = 0\). This map thus converges.

On the other side, some maps never really settle to any particular value as \(n\) increases. Take the rule \(x_{n+1} = (-1)^n x_n\).

Then, for a fixed number \(x\), this map will output sign-flipping versions of \(x\) every 2 terms forever: \(x_0 = x\) and \(x_1 = (-1)^0 x_0 = x\); then a 2-cycle of negative values show up: \(x_2 = (-1)^1 x_1 = -x\) and \(x_3 = (-1)^2 x_2 = -x\).

This sequence will thus oscillate forever (\(x,x,-x,-x,x,x,-x,-x,...\)), never getting closer to a value. For that reason we say this map does not converge.

The Mandelbrot set

A very famous fractal (perhaps the most famous them all) exploits this idea - the Mandelbrot set - formalized by Robert W. Brooks and Peter Matelski in 1978 but popularized by Benoit Mandelbrot 2 years later.

This set is computed simply by looking at a sequence of complex numbers generated by the iterative rule \(z_{n+1} = z_{n}^2 + c\), starting from \(z_0=0\). Depending on the value of the constant \(c\), this sequence will either converge or not. If it does not diverge, \(c\) belongs to the Mandelbrot set.

For example: the number \(c=0\) naturally belongs to the set. Starting with \(z_0=0\), we have that \(z_1 = 0^2 + 0 = 0\) and it’s clear that this will remain so: \(z_n\) = 0 - the sequence converges.

On the contrary: \(c=2\) is out of the Mandelbrot set. Again, with \(z_0=0\), we now have that \(z_1 = 0^2 + 2\), \(z_2 = 2^2 + 2 = 6\) and \(z_3 = 6^2 + 2 = 38\) etc. Clearly this is not bounded and thus grows forever.

So, if we take points on the complex point representing each possible value of \(c\), we can paint 2 colours given whether the iterative rule converges (or not).
A big central blob, with a fractal-like structure with an infinitely-rough boundary, will be found where no divergence is found e.g. due to being stuck in some cycle or indefinitely growing.

Fig. 2: The Mandelbrot set, graphically shown in the black region. Source: Wikipedia

If we also consider how fast the convergence takes place, additional visualization patterns can be retrieved. This can be done by colouring each complex point with a representative colour of how long the iterative rule takes to converge. Mesmerising graphs can be generated, where we zoom in indefinitely over this set:

Fig. 3: Zooming in the Mandelbrot set. Source: Wikipedia

What is a root of a function?

On a seemingly unrelated side, we need to understand what a root of a function is. Think of a real-valued function \(f: S \rightarrow \mathbb{R}\) which depends on a variable \(x\), itself living in some space \(S\). A root \(x_{r}\) (also called a zero) is any particular point in \(S\) for which \(f(x_{r}) = 0\).

Root solving is important because that is what allows many types of practical and theoretical problems to be solved. For instance, if we want to know find which values of the variables \(x,y,z, \ldots\) make some function \(f(x,y,z, \ldots)\) equal some value \(c\), we need to solve the condition \(f(x,y,z, \ldots) - c = 0\) i.e. to find the roots of the function \(p(x,y,z, \ldots) = f(x,y,z, \ldots) - c\).

For many reasons (sometimes very deep), it is not possible to find closed-form, analytical solutions for an arbitrary function \(f\).

We have long memorized the quadratic formula that gives the two possible roots of a quadratic polynomial \(p(x) = ax^2 + bx + c\) as \(x_c = (-b \pm \sqrt{b^2 - 4ac})/(2a)\).

Due to important developments started in the XVIth century by Gerolamo Cardano and Niccolò Fontana Tartaglia, a formula was found for a general cubic polynomial \(p(x) = ax^3 + bx^2 + cx + d\) (even though memorizing it is hardly doing yourself a favour as it is barbarically long.)

It turns out that these cases are the exception and not the rule. A paper about the theory of solvability by radicals submitted to the Paris Academy of Sciences in 1830 (and ultimately rejected a year later) ultimately proved that no such formula can be found for the quintic (polynomial of degree 5).

The author was noone other than Évariste Galois, a then 18-year old prodigy, whom we needlessly lost at age 22 in a duel. Galois theory would forever change our we think of this problem of finding polynomial roots and set it on a firm mathematical footing.

The consequences are tremendous however: an arbitrary polynomial \(p(x) = a_0 + a_1 x^1 + \ldots + a_N x^N\), belonging to perhaps the most tangible class of functions we have at hand, does not allow us to use elementary operations to express its roots, despite its smooth behaviour!

Therefore, we must resort to numerical algorithms to approximate their roots, oftentimes iteratively until the convergence is satisfactory i.e. for a sufficiently large number of iterations so the root estimate is very close to its actual value.

Finding the roots of functions by the Newton-Raphson method

Perhaps the simplest one of these root-finding algorithms is the Newton-Raphson method (NRM) - independently devised by Sir Isaac Newton and Joseph Raphson.

Contrary to what we could assume, the modern implementation of this method is based on Raphson’s (clearer) work and not Newton’s. This is curious in itself, given that the concept of the derivative of a function, a vital part of the method, was developed by Newton himself.

Newton’s method was compiled in 1671 in his Method of Fluxions but wouldn’t be published until 1736 – a whopping 50-year delay considering Raphson’s publishing of his only noteworthy treaty (where his method is presented) – the Analysis Aequationum Universalis,

The reasoning behind the NRM is surprisingly simple. We compute a sequence of root estimates \(x_{n}\) in such a way that when \(n\) is large enough, \(x_{n}\) is very close to the target root \(x_{c}\).

But how do we calculate \(x_n\)?

Fig. 4: A sketch of the Newton-Raphson method, used to estimate the derivative of a function. Credits: David Carvalho / Inductiva

Let’s take a function \(f(x)\) of a single variable \(x\). To find a root, we wish to apply the NRM, meaning that an arbitrary value \(x_0\) must first be given.
Now, it would be amazingly lucky to find an excellent root estimate by chance i.e. if it turned out that \(f(x_0) \approx 0\).

Consequently, we want to refine that choice i.e. finding some \(x_1\) for which \(f(x_1)\) is closer to 0 than \(f(x_0)\).For that, the NRM guesses the root estimate as the root of the linear approximation of the function i.e. it approximates the function in question as being a straight line and calculates its root.

Therefore, \(x_1\) is taken as the value where the tangent line of \(f\) at \(x_0\) crosses the \(x\) axis.

You can see that in the figure above that the function \(f(x) = x^2\) is concave and so it doesn’t really matter where on the right side of the root we start the NRM: it will always get better as we repeat this procedure.

Unfortunately this method has caveats: sometimes it never converges, it completely overshoots (for instance if \(f(x)= \sqrt{x}\)) or is not defined whenever we reach stationary points (i.e. points \(x\) for which \(f'(x)=0\)).

The ideal scenario however is verifying that applying this procedure recursively \(N\) times i.e. \(x_0 \mapsto x_1 \mapsto x_2 \mapsto \ldots \mapsto x_N\) leads to \(f(x_N) \approx 0\) and that \(N\) can be taken arbitrarily far.

Fortunately for us, we consider polynomial functions for which the NRM is known to almost always converge. Here, “almost” means that the algorithm will not converge only in an infinitesimally-small subset. As we will soon see, this is exactly where the boundary of the Newton fractal lies.

For our purpose, the most important factor of variability is the choice of the initial guess \(x_0\) (sometimes called the seed value), which can impact both the convergence and the convergence speed, depending on the local curvature of the function.

This connection is exactly what the Newton fractal exploits.

Understanding roots of functions by gazing at fractals

Now comes the fun part!

In order for the connection between the fractal construction (whose graph lives on the 2-dimensional plane) and the NRM to work, we need our variable \(x\) to live also live on that plane.

A good way is to achieve this is to consider our single variable to be complex. Therefore, our fractal will be generated with a fixed function \(f: \mathbb{C} \rightarrow \mathbb{C}\) of a complex variable \(z\).

This can be any differentiable function we’d like. However, we want to focus on polynomials of degree \(N_r\). Why exactly? We still don’t know which roots they have but, by the Fundamental Theorem of Algebra, we know there are at most \(N_r\) complex roots.

Now, let us assign a colour to each of the possible \(N_{r}\) roots of that function. If we now think of each point on the complex plane as the seed value i.e. the first element \(z_0\) of the iteration \(z_0 \rightarrow z_1 \rightarrow z_2 \rightarrow \ldots \rightarrow z_n\), we can colour that point according to the root the sequence (generated by the NRM) converges to.

Applying this procedure for all complex numbers will generate a mesmerizing fractal, where relatively large single-coloured regions eventually weave in stunningly beautiful ways. We just generated our first Newton fractal!

Here’s what is obtained for a variety of polynomials from degree 3 to 53 after iterating the NRM 75 times (in some cases, you may need to scroll to see the full polynomial):

$$P(z) = -1 + z^3$$

$$P(z) = 1 - 0.5 z^2 - 0.25 z^3 + 0.5 z^4$$

$$P(z) = -1.36 -1.38 z -0.02 z^2 +0.37 z^3 + 0.73 z^4 + 0.28 z^5 +1.3z^6 + 1.76 z^7 -1.13 z^8$$

$$P(z) = -16 + 15 z^4 + z^8$$

$$P(z) = -1.9 + 0.93 z + 0.97 z^2 + 0.79 z^3 - 0.8 z^4 - 1.71 z^5 - 1.04 z^6 + 2.01 z^7 -0.92 z^8 - 1.31 z^9 -0.4 z^{10} + 0.5z^{11} -1.57 z^{12} + 0.58 z^{12} -0.64 z^{13} -0.79 z^{14}$$

$$P(z) = -1.9 + 0.93 z + 0.97 z^2 + 0.79 z^3 -0.8 z^4 -1.71 z^5 -1.04 z^6 + 2.01 z^7 -0.92 z^8 -1.31 z^9 -0.4 z^{10} + 0.5 z^{11} -1.57 z^{12} -0.58 z^{13} -0.64 z^{14} -0.79 z^{15}$$

$$P(z) = 0.69 -0.88 z + 0.39 z^2 + 0.2 z^3 -1.57 z^4 + 0.02 z^5 + 0.11 z^6 +1.57 z^7 -0.24 z^8 -1.41 z^9 +0.45 z^{10} -1.34 z^{11} + 0.98 z^{12} + 0.06 z^{13} + 0.48 z^{14} -0.74 z^{15} -0.95 z^{16} -3.02 z^{17} -1.07 z^{18} + 0.41 z^{19} + 0.84 z^{20}$$

$$P(z) = 16 + z^{2} + 15 z^{4} -z^{6} + z^{8} + 16z^{9} + z^{11} + 15z^{13} -z^{15} + z^{17} + 16z^{18} + z^{20} + 15z^{22} -1z^{24} + z^{26} + 16z^{27} + z^{29} + z^{31} -10z^{32}$$

\begin{equation*} P(z) = 100 + 5 z^2 +15 z^4 -1 z^6 -5 z^7 + z^8 \\ -16 z^9 + z^{11} + 15 z^{13} -z^{15} +z^{17} + 16 z^{18}+ z^{20}+15z^{22} -1z^{24} +z^{26}-100z^{27} + z^{28} + z^{29} + z^{31} - 10 z^{32} \end{equation*}

$$P(z) = 16 + 15 z^{4} + z^{8} + 16z^{9} + 15z^{13} + z^{17} +16 z^{18} + 15 z^{22} + z^{26} + 16 z^{27} +15z^{31} + z^{35} +16 z^{36} + 15z^{40} + z^{44} + 16z^{45} + 15z^{49} -z^{53}$$

Fig. 5: A bunch of different polynomials and their respective Newton fractals. Credits: David Carvalho & LuĂ­s Sarmento / Inductiva

So all we are doing to obtain a Newton fractal, in a nutshell:

  1. Fixing a function \(f(z)\)
  2. Applying the NRM to all points \(z\) on the plane a fixed number of iterations \(N\) i.e. calculating the sequence \((x_1, ..., x_N)\) for all points given by the rule in the figure (which depends on \(f\))
  3. Colouring each point on the plane according to which root \(x_N\) is to.

In the fractals above, we chose to find a window where all \(N_r\) roots can be found (you can count the number of white dots). However, more exotic and non-ending patterns would be found if we focused on smaller regions.

To recap: we are seeing to which root the NRM will make each seed value, represented on the plane, go to. If two different points are painted say, blue, this means that we can use either point as \(z_0\) and the NMR will converge to the same root (namely the one we chose to colour blue).

Of course there is no information on how fast the algorithm will take for either seed value but we can be sure both will eventually converge to that root (and not any other).

Why such complex behaviour?

Naturally, we expect points close to the root values to converge to that root. You can see that, for the most part, big regions around the roots (the white cirles) can be found.

But for some regions of the plane something amazing happens. If a particular point converges to a certain root, then straying away from that point by an infinitesimal amount yields a completely different root!

This happens exactly when a region contains the boundary of the fractal. If you draw a circle on the plane, it can either be in the interior of some single-coloured region, meaning that all points enclosed by that circle all converge to the same root, or it can contain points on the boundary, meaning that there will be points converging to any other possible roots.

This property, which comes entirely from the algebraic properties of the iteration instruction and function, reveals itself geometrically in a fractal-like boundary: the line separating any regions must touch all other regions at the same time.

Now, in order to generate all the fractal figures we have showed, we wrote a script in Python using differentiable TensorFlow code. But a question still lingers.

So, what has all this to do with Machine Learning?

To which we reply that, well, that’s what we are going to show you in 2022!

The field of pure Mathematics has a lot to profit from the advances of Machine Learning and we at Inductiva are keen to unleash their potential.

Stay tuned for our updates and keep checking our blog!


[1] The Wikipedia article on Newton fractals (including the fractals of some stunning non-polynomial functions).
[2] The video that inspired the generation of this code by us. We seriously recommend watching this first part, for a pedagogical exposition of the topic of Newton fractals for polynomial functions.
[3] In the second part, a surprising connection is established between Newton fractals and holomorphic dynamics.
[4] For a more exploratory study of these objects, this is a very pedagogical and accessible read (alongside some more stunning visuals).

🎄 Merry Christmas and a Happy New Year! 🎄

From our team at Inductiva, we hope these Christmas-y looking fractals will inspire you in these cozy winter times and energise you for the year ahead.

Recent posts from our blog

Hugo Penedones

Hugo Penedones

LuĂ­s Sarmento

LuĂ­s Sarmento

The Inductiva API v0.4 release brings MPI clusters, the latest Google Cloud CPUs, two new simulators, a lighter Python package, a CLI interface, a template engine and totally revamped documentation. Get started in minutes!

Sofia Guerreiro

Sofia Guerreiro

Cristiana Carpinteiro

Cristiana Carpinteiro

In this series of blog posts we will explore a specific case of the use of AI in the pharmaceutical industry - using Graph Neural Networks for predicting binding affinity. But for now, let’s start by understanding the problem of drug discovery and some fundamental concepts like binding affinity.