# Newton Fractals (oh-oh-oh!)

David Carvalho

Itâ€™s supremely satisfying when a somewhat abstract and dry problem in Mathematics can be visualised in awestriking ways.

For these coming winter festivities đźŽ„, and to commemorate the 378^{th} birthday of
Sir Isaac Newton (who was born on the 25^{th} of December of 1643), we bring you one
such case.

We invite you to come gaze (and learn) about some beautiful
*Newton fractals*.

For you to understand what these are about, you will need to grasp two easy concepts: that of a fractal and that of a root of a function.

Letâ€™s start!

## What are fractals?

For those of you who are not baby boomers or never marvelled at some Psychedelic Rock posters and artwork, maybe a little introduction is necessary.

Fractals are objects that can encapsulate a never-ending pattern (sometimes even
repeating itself indefinitely) as we zoom in or out. This is usually achieved by
repeating some action recursively **forever**. For this reason, fractals can
have infinite detail â€“ *there just isnâ€™t enough resolution to approximate them*.

### Fractal dimensions

The name â€śfractalâ€ť comes from the fact that these objects are described by a non-integer dimension \(d\). We expect â€śnormalâ€ť spaces to scale by an integer.

What does this mean? Take the 2-dimensional plane \(\mathbb{R}^2\): for a length \(l\), the associated measure in this space is the area \(A=l^2\). In this perspective, the dimension of this space tells us how this measure (in this case the area) scales with respect to \(l\).

Its precisely this relationshsip that allows us to term this space 2-dimensional - namely by what the exponent is: \(d=2\). Analogously, \(\mathbb{R}^3\) has an associated measure the volume \(V = l^3\) and hence \(d=3\). In more tangible terms, this fact implies that if, say, we double \(l\), the associated measure increase as \(2^d\). Thus, if we, say, double the side of a square, its area will increase \(2^2 = 4\)-fold and its volume \(2^3=8\)-fold.

Dimensions in fractal spaces are way less intuitive. Letâ€™s have a look at a very famous fractal: the SierpiĹ„ski triangle. It is formed by, starting from the initial configuration of an equilateral triangle, finding the midpoints of its sides and, with them, inscribing a new equilateral triangle inside.

The evolution of this procedure for only 4 steps can be seen below. As this construction is recursively performed, more equilateral triangles are formed and it doesnâ€™t take many steps until we have a very complex and rough output.

Using the previous scaling argument, we see that if we *double* the length of the
base of the initial triangle, this construction will create a structure with now
*three* times the area of the initial triangle.

Equivalently, this is to say that the dimension \(d\) of the fractal space must
satisfy \(2^d = 3\), which implies \(d = \log(3)/\log(2) \approx 1.58496250072\) â€“
an irrational number but most importantly a number *between* integers. Oh oh oh!

### To converge or not to converge - that is the question!

For a structure to be â€śstableâ€ť as we apply these recursive rules, some notion of convergence needs to take place. What do we exactly mean by this?

Here, convergence means that the result of applying some rule gets closer to some
value. For example, if our rule is \(x_{n+1} = (1/2) x_n\), we see that this will rapidly
converge to 0 - start with a number \(x_0 = x\); then \(x_1 = (1/2) x_0\) and
\(x_2 = (1/2)x_1 = (1/2)^2 x_0\). In general \(x_n = (1/2)^n x_0\) and, in the limit of an
infinity-large \(n\), we have \(\lim_{x \to 0} x_n = 0\). This map thus *converges*.

On the other side, some maps never really settle to any particular value as \(n\) increases. Take the rule \(x_{n+1} = (-1)^n x_n\).

Then, for a fixed number \(x\), this map will output sign-flipping versions of \(x\)
every 2 terms *forever*: \(x_0 = x\) and \(x_1 = (-1)^0 x_0 = x\); then a 2-cycle of negative values show up:
\(x_2 = (-1)^1 x_1 = -x\) and \(x_3 = (-1)^2 x_2 = -x\).

This sequence will thus oscillate forever (\(x,x,-x,-x,x,x,-x,-x,...\)),
never getting closer to a value. For that reason we say this map does *not* converge.

### The Mandelbrot set

A very famous fractal (perhaps the most famous them all)
exploits this idea - the *Mandelbrot set* - formalized by Robert W. Brooks and
Peter Matelski in 1978 but popularized by Benoit Mandelbrot 2 years later.

This set is computed simply by looking at a sequence of complex numbers generated by the iterative rule \(z_{n+1} = z_{n}^2 + c\), starting from \(z_0=0\). Depending on the value of the constant \(c\), this sequence will either converge or not. If it does not diverge, \(c\) belongs to the Mandelbrot set.

For example: the number \(c=0\) naturally belongs to the set. Starting with \(z_0=0\), we have that \(z_1 = 0^2 + 0 = 0\) and itâ€™s clear that this will remain so: \(z_n\) = 0 - the sequence converges.

On the contrary: \(c=2\) is out of the Mandelbrot set. Again, with \(z_0=0\), we now have that \(z_1 = 0^2 + 2\), \(z_2 = 2^2 + 2 = 6\) and \(z_3 = 6^2 + 2 = 38\) etc. Clearly this is not bounded and thus grows forever.

So, if we take points on the complex point representing each possible value of \(c\), we can paint 2 colours given whether the iterative rule converges (or not). A big central blob, with a fractal-like structure with an infinitely-rough boundary, will be found where no divergence is found e.g. due to being stuck in some cycle or indefinitely growing.

The Mandelbrot set, graphically shown in the black region. Source: Wikipedia

If we also consider how fast the convergence takes place, additional
visualization patterns can be retrieved. This can be done by colouring each
complex point with a representative colour of how long the iterative rule
takes to converge. Mesmerising graphs can be generated, where we zoom in *indefinitely* over this
set:

Zooming in the Mandelbrot set. Source: Wikipedia

## What is a root of a function?

On a seemingly unrelated side, we need to understand what a root of a function
is. Think of a real-valued function \(f: S \rightarrow \mathbb{R}\) which depends
on a variable \(x\), itself living in some space \(S\). A root \(x_{r}\)
(also called a zero) is *any* particular point in \(S\) for which \(f(x_{r}) = 0\).

Root solving is important because that is what allows many types of practical and theoretical problems to be solved. For instance, if we want to know find which values of the variables \(x,y,z, \ldots\) make some function \(f(x,y,z, \ldots)\) equal some value \(c\), we need to solve the condition \(f(x,y,z, \ldots) - c = 0\) i.e. to find the roots of the function \(p(x,y,z, \ldots) = f(x,y,z, \ldots) - c\).

For many reasons (sometimes very deep), it is not possible to find closed-form, analytical solutions for an arbitrary function \(f\).

We have long memorized the quadratic formula that gives the two possible roots of a quadratic polynomial \(p(x) = ax^2 + bx + c\) as \(x_c = (-b \pm \sqrt{b^2 - 4ac})/(2a)\).

Due to important developments started in the XVI^{th} century by Gerolamo
Cardano and NiccolĂ˛ Fontana Tartaglia, a formula was found for a general
cubic polynomial \(p(x) = ax^3 + bx^2 + cx + d\) (even though memorizing it
is hardly doing yourself a favour as it is barbarically long.)

It turns out that these cases are the exception and not the rule. A paper about the theory of solvability by radicals submitted to the Paris Academy of Sciences in 1830 (and ultimately rejected a year later) ultimately proved that no such formula can be found for the quintic (polynomial of degree 5).

The author was noone other than Ă‰variste Galois, a then 18-year old prodigy, whom we needlessly lost at age 22 in a duel. Galois theory would forever change our we think of this problem of finding polynomial roots and set it on a firm mathematical footing.

The consequences are tremendous however: an arbitrary polynomial \(p(x) = a_0 + a_1 x^1 + \ldots + a_N x^N\), belonging to perhaps the most tangible class of functions we have at hand, does not allow us to use elementary operations to express its roots, despite its smooth behaviour!

Therefore, we must resort to numerical algorithms to approximate their roots, oftentimes iteratively until the convergence is satisfactory i.e. for a sufficiently large number of iterations so the root estimate is very close to its actual value.

## Finding the roots of functions by the Newton-Raphson method

Perhaps the simplest one of these root-finding algorithms is the Newton-Raphson method (NRM) - independently devised by Sir Isaac Newton and Joseph Raphson.

Contrary to what we could assume, the modern implementation of this method is based on Raphsonâ€™s (clearer) work and not Newtonâ€™s. This is curious in itself, given that the concept of the derivative of a function, a vital part of the method, was developed by Newton himself.

Newtonâ€™s method was compiled in 1671 in his *Method of Fluxions* but wouldnâ€™t
be published until 1736 â€“ a whopping 50-year delay considering Raphsonâ€™s
publishing of his only noteworthy treaty (where his method is presented) â€“ the
*Analysis Aequationum Universalis*,

The reasoning behind the NRM is surprisingly simple. We compute a sequence of root estimates \(x_{n}\) in such a way that when \(n\) is large enough, \(x_{n}\) is very close to the target root \(x_{c}\).

But how do we calculate \(x_n\)?

Letâ€™s take a function \(f(x)\) of a single variable \(x\). To find a root, we wish to apply the NRM, meaning that an arbitrary value \(x_0\) must first be given. Now, it would be amazingly lucky to find an excellent root estimate by chance i.e. if it turned out that \(f(x_0) \approx 0\).

Consequently, we want to refine that choice i.e. finding some \(x_1\) for which
\(f(x_1)\) is closer to 0 than \(f(x_0)\).For that, the NRM guesses the root estimate
as the root of the *linear approximation* of the function i.e. it approximates the
function in question as being a straight line and calculates its root.

Therefore, \(x_1\) is taken as the value where the tangent line of \(f\) at \(x_0\) crosses the \(x\) axis.

You can see that in the figure above that the function \(f(x) = x^2\) is concave and
so it doesnâ€™t really matter where on the right side of the root we start the NRM:
it will always get **better** as we repeat this procedure.

Unfortunately this method has caveats: sometimes it never converges, it completely overshoots (for instance if \(f(x)= \sqrt{x}\)) or is not defined whenever we reach stationary points (i.e. points \(x\) for which \(f'(x)=0\)).

The ideal scenario however is verifying that applying this procedure recursively \(N\) times i.e. \(x_0 \mapsto x_1 \mapsto x_2 \mapsto \ldots \mapsto x_N\) leads to \(f(x_N) \approx 0\) and that \(N\) can be taken arbitrarily far.

Fortunately for us, we consider polynomial functions for which the NRM is known
to *almost always* converge. Here, â€śalmostâ€ť means that the algorithm will not
converge only in an infinitesimally-small subset. As we will soon see, this is
exactly where the boundary of the Newton fractal lies.

For our purpose, the most important factor of variability is the choice of the initial guess \(x_0\) (sometimes called the seed value), which can impact both the convergence and the convergence speed, depending on the local curvature of the function.

This connection is exactly what the Newton fractal exploits.

## Understanding roots of functions by gazing at fractals

Now comes the fun part!

In order for the connection between the fractal construction (whose graph lives on the 2-dimensional plane) and the NRM to work, we need our variable \(x\) to live also live on that plane.

A good way is to achieve this is to consider our single variable to be complex. Therefore, our fractal will be generated with a fixed function \(f: \mathbb{C} \rightarrow \mathbb{C}\) of a complex variable \(z\).

This can be any differentiable function weâ€™d like. However, we want to focus on polynomials of degree \(N_r\). Why exactly? We still donâ€™t know which roots they have but, by the Fundamental Theorem of Algebra, we know there are at most \(N_r\) complex roots.

Now, let us assign a colour to each of the possible \(N_{r}\) roots of that function. If we now think of each point on the complex plane as the seed value i.e. the first element \(z_0\) of the iteration \(z_0 \rightarrow z_1 \rightarrow z_2 \rightarrow \ldots \rightarrow z_n\), we can colour that point according to the root the sequence (generated by the NRM) converges to.

Applying this procedure for all complex numbers will generate a mesmerizing fractal, where relatively large single-coloured regions eventually weave in stunningly beautiful ways. We just generated our first Newton fractal!

Hereâ€™s what is obtained for a variety of polynomials from degree 3 to 53 after iterating the NRM 75 times (in some cases, you may need to scroll to see the full polynomial):

$$P(z) = -1 + z^3$$

$$P(z) = 1 - 0.5 z^2 - 0.25 z^3 + 0.5 z^4$$

$$P(z) = -1.36 -1.38 z -0.02 z^2 +0.37 z^3 + 0.73 z^4 + 0.28 z^5 +1.3z^6 + 1.76 z^7 -1.13 z^8$$

$$P(z) = -16 + 15 z^4 + z^8$$

$$P(z) = -1.9 + 0.93 z + 0.97 z^2 + 0.79 z^3 - 0.8 z^4 - 1.71 z^5 - 1.04 z^6 + 2.01 z^7 -0.92 z^8 - 1.31 z^9 -0.4 z^{10} + 0.5z^{11} -1.57 z^{12} + 0.58 z^{12} -0.64 z^{13} -0.79 z^{14}$$

$$P(z) = -1.9 + 0.93 z + 0.97 z^2 + 0.79 z^3 -0.8 z^4 -1.71 z^5 -1.04 z^6 + 2.01 z^7 -0.92 z^8 -1.31 z^9 -0.4 z^{10} + 0.5 z^{11} -1.57 z^{12} -0.58 z^{13} -0.64 z^{14} -0.79 z^{15}$$

$$P(z) = 0.69 -0.88 z + 0.39 z^2 + 0.2 z^3 -1.57 z^4 + 0.02 z^5 + 0.11 z^6 +1.57 z^7 -0.24 z^8 -1.41 z^9 +0.45 z^{10} -1.34 z^{11} + 0.98 z^{12} + 0.06 z^{13} + 0.48 z^{14} -0.74 z^{15} -0.95 z^{16} -3.02 z^{17} -1.07 z^{18} + 0.41 z^{19} + 0.84 z^{20}$$

$$P(z) = 16 + z^{2} + 15 z^{4} -z^{6} + z^{8} + 16z^{9} + z^{11} + 15z^{13} -z^{15} + z^{17} + 16z^{18} + z^{20} + 15z^{22} -1z^{24} + z^{26} + 16z^{27} + z^{29} + z^{31} -10z^{32}$$

\begin{equation*} P(z) = 100 + 5 z^2 +15 z^4 -1 z^6 -5 z^7 + z^8 \\ -16 z^9 + z^{11} + 15 z^{13} -z^{15} +z^{17} + 16 z^{18}+ z^{20}+15z^{22} -1z^{24} +z^{26}-100z^{27} + z^{28} + z^{29} + z^{31} - 10 z^{32} \end{equation*}

$$P(z) = 16 + 15 z^{4} + z^{8} + 16z^{9} + 15z^{13} + z^{17} +16 z^{18} + 15 z^{22} + z^{26} + 16 z^{27} +15z^{31} + z^{35} +16 z^{36} + 15z^{40} + z^{44} + 16z^{45} + 15z^{49} -z^{53}$$

So all we are doing to obtain a Newton fractal, in a nutshell:

- Fixing a function \(f(z)\)
- Applying the NRM to all points \(z\) on the plane a fixed number of iterations \(N\) i.e. calculating the sequence \((x_1, ..., x_N)\) for all points given by the rule in the figure (which depends on \(f\))
- Colouring each point on the plane according to which root \(x_N\) is to.

In the fractals above, we chose to find a window where all \(N_r\) roots can be found (you can count the number of white dots). However, more exotic and non-ending patterns would be found if we focused on smaller regions.

To recap: we are seeing to which root the NRM will make each seed value, represented on the plane, go to. If two different points are painted say, blue, this means that we can use either point as \(z_0\) and the NMR will converge to the same root (namely the one we chose to colour blue). Of course there is no information on how fast the algorithm will take for either seed value but we can be sure both will eventually converge to that root (and not any other root).

### Why such complex behaviour?

Naturally, we expect points close to the root values to converge to that root. You can see that, for the most part, big regions around the roots (the white cirles) can be found.

But for some regions of the plane something amazing happens. If a particular point converges to a certain root, then straying away from that point by an infinitesimal amount yields a completely different root!

This happens *exactly* when a region contains the boundary of the fractal.
If you draw a circle on the plane, it can either be in the interior of some
single-coloured region, meaning that **all** points enclosed by that circle all
converge to the same root, or it can contain points on the boundary, meaning
that there will be points converging to any other possible roots.

This property, which comes entirely from the algebraic properties of the
iteration instruction and function, reveals itself geometrically in a
fractal-like boundary: the line separating any regions must touch *all* other
regions *at the same time*.

Now, in order to generate all the fractal figures we have showed, we wrote a script in Python using differentiable TensorFlow code. But a question still lingers.

## So, what has all this to do with Machine Learning?

To which we reply that, well, **thatâ€™s what we are going to show you in 2022!**

The field of pure Mathematics has a lot to profit from the advances of Machine Learning and we at Inductiva are keen to unleash their potential.

Stay tuned for our updates and keep checking our blog!

## References

[1] The Wikipedia article on
Newton fractals (including the fractals of some stunning non-polynomial functions).

[2] The video that
inspired the generation of this code by us. We seriously recommend watching this
first part, for a pedagogical exposition of the topic of Newton fractals for
polynomial functions.

[3] In the second part, a surprising
connection is established between Newton fractals and holomorphic dynamics.

[4] For a more
exploratory study of these objects, this is a very pedagogical and accessible
read (alongside some more stunning visuals).

## đźŽ„ Merry Christmas and a Happy New Year! đźŽ„

From our team at Inductiva, we hope these Christmas-y looking fractals will inspire you in these cozy winter times and energise you for the year ahead.

## Recent posts from our blog

InĂŞs GuimarĂŁes

In a series spanning various posts, we dive deep into the topic of Hadamard matrices and how Machine Learning can help us better understand these objects both theoretically and within the context of applications.