Newton’s method

Newton’s method, sometimes called the Newton-Rhapson method, is a surprisingly simple and effective method for finding solutions to an equation of the form:

f (x) = 0

where $f$ is a differentiable function

Newton’s method is an iterative method, which means that we pick some starting point $x_{0}$ , and we follow some procedure to find $x_{1}$ , which is closer to the real root $x_{r}$ that satisfies $f (x_{r}) = 0$ . Then, we apply this method again and again, until we find some $x_{n} \approx x_{r}$ that approximates the real root $x_{r}$ with satisfactory accuracy.

I will illustrate the method on a function real-valued function $f : R \to R$ . First, we pick a starting point $x_{0}$ . We then compute the linearized approximation to $f$ at this point (this is just the first-order Taylor expansion at $x_{0}$ ):

f_{approx} (x_{0} + Δ x) = f (x_{0}) + Δ x \cdot f^{'} (x_{0})

Now, we find the root of this approximation $f_{approx}$ by solving for $Δ x$ in

f (x_{0}) + Δ x \cdot f^{'} (x_{0}) = 0

and we find $Δ x = - \frac{f ( x _{0} )}{f ^{'} ( x _{0} )}$ . So $f_{approx} (x) = 0$ implies $x = x_{0} + Δ x = x - \frac{f ( x _{0}}{f ^{'} ( x _{0} )}$ . So, we have a new, supposedly better, approximation of the real root $x_{r}$ . Applying this iteratively gives:

x_{k + 1} = x_{k} - \frac{f ( x _{k} )}{f ^{'} ( x _{k} )}

which is the basic scheme behind Newton’s method.

We haven’t proved anything about the convergence of this method, and indeed, in general, the method might diverge. Note that it’s also possible to use an approximation $f_{approx}$ of higher order, e.g.

f_{approx} (x) = f (x_{0}) + f^{'} (x_{0}) (x - x_{0}) + \frac{f ^{''} ( x _{0} )}{2} (x - x_{0})^{2}

However, using this method requires solving a more complicated solution. Instead of a simple linear equation we end up with a quadratic one.

Further, this method also works for multi-valued functions $f$ , with the restriction that $f$ should be an $R^{n} \to R^{n}$ function. We can simply replace the factor $\frac{1}{f ^{'} ( x _{0} )}$ by $(D f (x_{0}))^{- 1}$ :

x_{k + 1} = x_{k} - (D f (x_{0}))^{- 1} f (x_{k})

Convergence analysis

I will only consider the case where $f : R \to R$ . I believe the general case where $f : R^{n} \to R^{n}$ is messier, but the idea is essentially the same.

Take $x$ to be the root of $f$ , e.g. $f (x) = 0$ . Suppose that we have an estimate $x_{k} = x + ϵ$ . We then have

x_{k + 1} = x_{k} - \frac{f ( x _{k} )}{f ^{'} ( x _{k} )} = x + ϵ - \frac{f ( x + ϵ )}{f ^{'} ( x + ϵ )}

Using the Taylor expansion of $f$ at $x$ , $f (x) = 0$ , and asymptotic notation gives:

f (x + ϵ) = ϵ f^{'} (x) + ϵ^{2} \frac{f ^{''} ( x )}{2} + O (ϵ^{3})

From the first-order Taylor expansion of $f^{'}$ at $x$ we find

f^{'} (x + ϵ) = f^{'} (x) + O (ϵ)

From the above two equalities we find

x_{k + 1} = x + ϵ - \frac{ϵ f ^{'} ( x ) + O ( ϵ ^{2} )}{f ^{'} ( x ) + O ( ϵ )}

= x + \frac{O ( ϵ ^{2} )}{f ^{'} ( x ) + O ( ϵ )}

= x + O (ϵ^{2})

So, if $ϵ$ is sufficiently small, each iteration of Newton’s method reduces the error to the square of the error of the previous iteration. We say that Newton’s method has quadratic convergence.

In practice, this is also observed:

| $k$ | $x_{k}$ | $ϵ_{k}$ | |--|--|--| | 0 | 1 | 0.414213562373095 | | 1 | 1.5 | 0.085786437626905 | | 2 | 1.416666666666667 | 0.002453104293572 | | 3 | 1.414215686274510 | 0.000002123901415 | | 4 | 1.414213562374690 | 0.000000000001595 |

The number of leading zeros in the error roughly doubles each iteration, leading to 11 correct decimal digits after just four iterations.

Picking initial values

One problem with Newton’s method is that it’s sensitive to the initial value. Sometimes it’s possible to make an educated guess, but often not. If this is a problem, one might consider using a homotopy method. This description is based on the description given in the book "Numerical Methods in Scientific Computing", by J. van Kan, A. Segal, and F. Vermolen. In this method, one picks an equation $g (x) = 0$ with a known solution $x = x_{g}$ , and considers the problem

(1 - λ) g (x) + λ f (x) = 0

And now one increases $λ$ in small steps. Every time, Newton’s method is used to find the solution of $(1 - λ) g (x) + λ f (x) = 0$ . The solution found this way is then used as an initial estimate for the next step.