Gaussian quadrature and orthogonal polynomials

The proofs here are based on the lecture notes from a course in approximation theory [1]. I think most of the results and proofs are standard, so I expect most of the texts on this subject to contain similar results and proofs.

Suppose μ(x)\mu(x) is a nonnegative weight function and we want to compute the integral
f(x) μ(x) dx \int f(x)\ \mu(x)\ \text{d}x

for many different functions ff.

Depending on what ff and μ\mu are, we might be able to evaluate the integral analytically (i.e. by integrating and obtaining an exact answer). However, in computational contexts it is often both more efficient and simpler to use a quadrature rule.

Definition 1: A quadrature rule is a collection of weights w1,w2,...,wnw_1, w_2, ..., w_n and positions x1,x2,...,xnx_1, x_2, ..., x_n such that
k=qnwkf(xk)f(x) μ(x) dx \sum_{k = q}^n w_k f(x_k) \approx \int f(x)\ \mu(x)\ \text{d}x

Since the approximation given by the quadrature rule only depends on the value of ff on x1,x2,...,xnx_1, x_2, ..., x_n, the approximation will be better for smoother functions, for which the value of ff doesn’t fluctuate too wildly, so the behavior is captured well by its value on some positions.

In practice, we often require the quadrature rule to be exact for some set SS of functions. Often, we take SS as the set of polynomials up to some maximum degree. We call this the exactness conditions. We can write the exactness conditions as a finite number of equations by taking a basis f1,f2,...f_1, f_2, ... of SS. By the linearity of the integral the quadrature rule is also exact for any linear combination of f1,f2,...f_1, f_2, ... which means that the quadrature rule is exact for any fSf \in S.

In the following, we will denote the space of polynomials with degree up to nn as Pn\mathbb{P}_n.

Quadrature rules, the easy way

It is not too complicated to derive quadrature rules that are exact for any polynomial pPnp \in \mathbb{P}_n for a given nn. Give a function ff and d+1d + 1 distinct points x0,x1,...,xdx_0, x_1, ..., x_d we can use Lagrange interpolation to construct a polynomial fdf_d of degree dd that interpolates ff at x0,x1,...,xdx_0, x_1, ..., x_d. That is,
fd(xk)=f(xk) for k=0,1,...,d f_d(x_k) = f(x_k)\ \text{for $k = 0, 1, ..., d$}

The polynomial fdf_d is defined as fd(x)=k=0nf(xk)lk(x)f_d(x) = \sum_{k = 0}^n f(x_k) l_k(x), where lkl_k are the Lagrange polynomials defined by
lk(x)=(xx0)(xx1)(xxk1)(xxk+1)(xxd)(xkx0)(xkx1)(xkxk1)(xkxk+1)(xkxd) l_k(x) = \frac{(x - x_0)(x - x_1) \cdots (x - x_{k - 1}) (x - x_{k + 1}) \cdots (x - x_d)}{(x_k - x_0)(x_k - x_1) \cdots (x_k - x_{k - 1}) (x_k - x_{k + 1}) \cdots (x_k - x_d)}

We can now integrate the interpolation polynomial fdf_d of ff as
fd(x) μ(x) dx=k=0nf(xk)lk(x) μ(x) dx \int f_d(x)\ \mu(x)\ \text{d}x = \int \sum_{k = 0}^n f(x_k)l_k(x)\ \mu(x)\ \text{d}x

Bringing the summation and the terms f(xk)f(x_k) outside of the integral, we see
fd(x) μ(x) dx=k=0nf(xk)lk(x) μ(x) dx \int f_d(x)\ \mu(x)\ \text{d}x = \sum_{k = 0}^n f(x_k) \int l_k(x)\ \mu(x)\ \text{d}x

So if we set wk=lk(x) μ(x) dxw_k = \int l_k(x)\ \mu(x)\ \text{d}x, we have
fd(x) μ(x) dx=k=0nwkf(xk) \int f_d(x)\ \mu(x)\ \text{d}x = \sum_{k = 0}^n w_k f(x_k)

Now, since fdf_d interpolates ff, we expect that fdff_d \approx f, so that
f(x) μ(x) dxfd(x) μ(x) dx=k=0nwkf(xk) \int f(x)\ \mu(x)\ \text{d}x \approx \int f_d(x)\ \mu(x)\ \text{d}x = \sum_{k = 0}^n w_k f(x_k)

So we have found a quadrature rule that is exact for any fPdf \in \mathbb{P}_d.

It is also possible to find the weights in a more computational way. If we want the quadrature rule to be exact for a function space of dimension dd, we can pick a basis of dd functions, pick dd distinct positions x1,x2,...,xdx_1, x_2, ..., x_d, write out the exactness conditions, and solve the resulting linear system of equations for the weights.

For example, suppose that we want the quadrature rule to be exact for any pPdp \in \mathbb{P}_d. We pick d+1d + 1 basis functions b0,b1,...,bdb_0, b_1, ..., b_d (for example bk(x)=xkb_k(x) = x^k would work) and distinct positions x0,x1,...,xdx_0, x_1, ..., x_d. Then the exactness requirements become
j=0dwjbk(xj)=bk(x) μ(x) dx for k=0,1,...,d \sum_{j = 0}^d w_j b_k(x_j) = \int b_k(x)\ \mu(x)\ \text{d}x\ \text{for $k = 0, 1, ..., d$}

which is a linear system of d+1d + 1 equations in the d+1d + 1 unknowns w0,w1,...wdw_0, w_1, ... w_d. In particular, if we use bk(x)=xkb_k(x) = x^k the resulting system is a Vandermonde system.

For now, we have regarded the positions x1,x2,...,xdx_1, x_2, ..., x_d as given. It is usually a good choice to pick them as Chebyshev nodes, since this ensures that the polynomial fdf_d that interpolates ff is actually close to ff – it minimizes oscillatory behavior of fdf_d.

If we change our point of view and interpret the positions x1,x2,...,xdx_1, x_2, ..., x_d as unknowns we can interpret the exactness requirements as a nonlinear system in 2d2d unknowns. If we expect this system to behave like a non-singular linear one (which is not a given, but let’s be optimistic), we would expect to be able to solve a system of 2d2d equations with it. This is by analogy with a linear system, where a non-singular system has a unique solution if there are as many unknowns as equations.

So, this would mean that for each nn there exists a quadrature rule with nn points that is exact for P2n1\mathbb{P}_{2n - 1}. These quadrature uses roughly half the number of points of the quadrature rules derived via the method presented earlier, at the cost of not having the freedom of choosing the positions x0,x2,...,xn1x_0, x_2, ..., x_{n - 1} freely.

It is possible to derive quadrature rules like this using orthogonal polynomials.

Orthogonal polynomials

Consider the following inner product on a space of functions
<f,g>=f(x)g(x) μ(x) dx \left< f, g \right> = \int f(x) g(x)\ \mu(x)\ \text{d}x

Like before, μ(x)\mu(x) is a nonnegative weight function. This inner product makes the function space into a vector space.

The usual notions of orthogonality and orthonormality apply.

Definition 2: A sequence of polynomials p0,p1,...p_0, p_1, ... is orthogonal if

  1. deg(pk)=k\text{deg}(p_k) = k
  2. <pi,pj>=0\left< p_i, p_j \right> = 0 whenever iji \neq j

Definition 3: A sequence of polynomials p0,p1,...p_0, p_1, ... is orthonormal if it is orthogonal and <pk,pk>=1\left< p_k, p_k \right> = 1 for every k=0,1,...k = 0, 1, ...

We will use the following lemmas, which are easy to understand and prove.

Lemma 4: If p0,p1,...p_0, p_1, ... is a sequence of orthogonal polynomials and pp is a polynomial with deg(p)<n\text{deg}(p) < n, then
<p,pn>=0 \left< p, p_n \right> = 0

Proof: The polynomials p0,p1,...,pn1p_0, p_1, ..., p_{n - 1} form a basis of Pn\mathbb{P}_n. Since pPnp \in \mathbb{P}_n, we can write p(x)=k=0n1ckpk(x)p(x) = \sum_{k = 0}^{n - 1} c_k p_k(x). It follows that <p,pn>=<k=0n1ckpk,p>=k=0n1ck<pk,pn>\left< p, p_n \right> = \left< \sum_{k = 0}^{n - 1} c_k p_k, p \right> = \sum_{k = 0}^{n - 1} c_k \left< p_k, p_n \right>. Since p0,p1,...p_0, p_1, ... is an orthogonal sequence of polynomials and k<nk < n, all terms in the summation are zero, so it follows that <p,pn>=0\left< p, p_n \right> = 0.

\square

The following lemma states that polynomials can be easily expressed as a linear combination of orthonormal polynomials.

Lemma 5: If p0,p1,...p_0, p_1, ... is a sequence of orthonormal polynomials and pp is a polynomial with deg(p)=n\text{deg}(p) = n, then
p(x)=k=0n<p,pk>pk(x) p(x) = \sum_{k = 0}^n \left< p, p_k \right> p_k(x)

Proof: Since p0,p1,...,pnp_0, p_1, ..., p_n is a basis of Pn\mathbb{P}_n and pPnp \in \mathbb{P}_n, we can write pp as p(x)=k=0nckpk(x)p(x) = \sum_{k = 0}^n c_k p_k(x) for some c0,c1,...,cnRc_0, c_1, ..., c_n \in \mathbb{R}. Taking the inner product of pp and pkp_k, we see
<p,pk>=<j=0ncjpj,pk>=j=0ncj<pj,pk>=ck \left< p, p_k \right> = \left< \sum_{j = 0}^n c_j p_j, p_k \right> = \sum_{j = 0}^n c_j \left< p_j, p_k \right> = c_k

So ck=<p,pk>c_k = \left< p, p_k \right>. Substituting this in $p(x) = \sum_{k = 0}^n c_k p_k(x) gives the required result.

\square

A sequence of orthonormal polynomials can be computed by using the Gram-Schmidt method, but it is more efficient to use the following result.

Theorem 6: If p0,p1,...p_0, p_1, ... is a sequence of polynomials then
<pn,xpn+1>pn+1(x)=xpn(x)<pn,xpn1>pn1(x)<pnxpn>pn(x) \left< p_n, x p_{n + 1} \right> p_{n + 1}(x) = xp_n(x) - \left< p_n, x p_{n - 1} \right> p_{n - 1}(x) - \left< p_n x p_n \right> p_n(x)

Proof: From lemma 5 we have xpn(x)=k=0n+1<xpn,pk>pkx p_n(x) = \sum_{k = 0}^{n + 1} \left< x p_n, p_k \right> p_k. Now
<xpn,pk>=xpn(x)pk(x) μ(x) dx=<pn,xpk> \left< x p_n, p_k \right> = \int x p_n(x) p_k(x)\ \mu(x)\ \text{d}x = \left< p_n, x p_k \right>

So we have
xpn=k=0n+1<pn,xpk>pk x p_n = \sum_{k = 0}^{n + 1} \left< p_n, x p_k \right> p_k

By lemma 4, <pn,xpk>\left< p_n, x p_k \right> is zero when deg(xpk)=k+1<n\text{deg}(x p_k) = k + 1 < n, so k=n1,n,n+1k = n - 1, n, n + 1 are the only kn+1k \leq n + 1 for which <pn,xpk>\left< p_n, x p_k \right> is nonzero. So we have
xpn=<pn,xpn1>pn1(x)+<pn,xpn>pn(x)+<pn,xpn+1>pn+1(x) x p_n = \left< p_n, x p_{n - 1} \right> p_{n - 1}(x) + \left< p_n, x p_n \right> p_n(x) + \left< p_n, x p_{n + 1} \right> p_{n + 1}(x)

\square

Using theorem 6, we can compute the orthonormal polynomial pn+1p_{n + 1} from the previous polynomials as follows. First, we compute a polynomial qn+1(x)q_{n + 1}(x) which is orthogonal to p0,p1,...,pnp_0, p_1, ..., p_n as
q(x)=xpn(x)<pn,xpn1>pn1(x)<pn,xpn>pn(x) q(x) = x p_n(x) - \left< p_n, x p_{n - 1} \right> p_{n - 1}(x) - \left< p_n, x p_n \right> p_n(x)

Then, we compute pn+1p_{n + 1} by normalizing qn+1q_{n + 1}:
pn+1(x)=qn+1(x)<qn+1,qn+1> p_{n + 1}(x) = \frac{q_{n + 1}(x)}{\sqrt{\left< q_{n + 1}, q_{n + 1} \right>}}

There is the following nice result about the roots of orthogonal polynomials.

Lemma 7: Let p0,p1,...p_0, p_1, ... be a sequence of orthonal polynomials with respect to the inner product <f,g>=f(x)f(x) μ(x) dx\left< f, g \right> = \int f(x) f(x)\ \mu(x)\ \text{d}x, and let the support of μ\mu be contained in [a,b][a, b]. Then pnp_n has nn distinct real roots x1,x2,...,xn(a,b)x_1, x_2, ..., x_n \in (a, b).

Proof: Let x1,x2,...,xmx_1, x_2, ..., x_m be the zeroes of pnp_n that have odd multiplicity and lie in (a,b)(a, b). Define q(x)=(xx1)(xx2)(xxm)q(x) = (x - x_1)(x - x_2) \cdots (x - x_m). Now q(x)pn(x)q(x) p_n(x) does not change sign on [a,b][a, b], so <q,pn>0\left< q, p_n \right> \neq 0. It follows from mnm \leq n and lemma 4 that deg(q)=n\text{deg}(q) = n. So m=nm = n, which implies that pnp_n has nn distinct roots in (a,b)(a, b).

Now we are ready to prove the following theorem.

Theorem 8: Let p0,p1,...p_0, p_1, ... be a sequence of orthonal polynomials with respect to the inner product <f,g>=f(x)f(x) μ(x) dx\left< f, g \right> = \int f(x) f(x)\ \mu(x)\ \text{d}x and let x1,x2,...,xnx_1, x_2, ..., x_n be the zeroes of pnp_n. Define wk=lk(x) μ(x) dxw_k = \int l_k(x)\ \mu(x)\ \text{d}x for k=1,2,...,nk = 1, 2, ..., n, where l1,l2,...,lnl_1, l_2, ..., l_n are the Lagrange polynomials for x1,x2,...,xnx_1, x_2, ..., x_n. Then
k=1nwkf(xk)=f(x) μ(x) dx \sum_{k = 1}^n w_k f(x_k) = \int f(x)\ \mu(x)\ \text{d}x

for fP2n1f \in \mathbb{P}_{2n - 1}.

Proof: Let fnPnf_n \in \mathbb{P}_n be the polynomial of degree nn that interpolates ff at x1,x2,...,xnx_1, x_2, ..., x_n. If we can show that (f(x)fn(x)) μ(x) dx=0\int (f(x) - f_n(x))\ \mu(x)\ \text{d}x = 0, it follows that f(x) μ(x) dx=fn(x) μ(x) dx\int f(x)\ \mu(x)\ \text{d}x = \int f_n(x)\ \mu(x)\ \text{d}x.

First, we note that fn(x)f_n(x) assumes the same values as f(x)f(x) for x=x1,x2,...,xnx = x_1, x_2, ..., x_n. So f(x)fn(x)f(x) - f_n(x) has zeroes at x1,x2,...,xnx_1, x_2, ..., x_n. Since x1,x2,...,xnx_1, x_2, ..., x_n are the zeroes from pnp_n, the function f(x)fn(x)f(x) - f_n(x) has the same zeroes as pnp_n and must be some multiple of pnp_n, say f(x)fn(x)=cpn(x)f(x) - f_n(x) = c p_n(x). Then we can write f(x)fn(x)=cpn(x)q(x)f(x) - f_n(x) = c p_n(x) q(x) for some qPn1q \in \mathbb{P}_{n - 1}.

Now, we can write the integral as (f(x)fn(x)) μ(x) dx=cpn(x)q(x) μ(x) dx=c<pn,q>\int (f(x) - f_n(x))\ \mu(x)\ \text{d}x = c \int p_n(x) q(x)\ \mu(x)\ \text{d}x = c \left< p_n, q \right>. Since deg(q)<n\text{deg}(q) < n we have <pn,q>=0\left< p_n, q \right> = 0 by lemma 4. So we see that f(x) μ(x) dx=fn(x) μ(x) dx\int f(x)\ \mu(x)\ \text{d}x = \int f_n(x)\ \mu(x)\ \text{d}x. It was shown ealier that fn(x) μ(x) dx=k=1nwkf(xk)\int f_n(x)\ \mu(x)\ \text{d}x = \sum_{k = 1}^n w_k f(x_k), so it follows that
f(x) μ(x) dx=fn(x) μ(x) dx=k=1nwkf(xk) \int f(x)\ \mu(x)\ \text{d}x = \int f_n(x)\ \mu(x)\ \text{d}x = \sum_{k = 1}^n w_k f(x_k)

\square

Sometimes, we want to compute the integral
p(x)q(x) μ(x) dx \int p(x) q(x)\ \mu(x)\ \text{d}x

for many different pp and qq. Using theorem 6 and 8 and a root-finding method, we can compute a different quadrature rule for every qq. The inconvenient thing about this method is that the quadrature points x1,x2,...x_1, x_2, ... will be different for every qq.

In [2], it is proposed to instead pick one set of quadrature points and re-use these for every qq. This means we have to use more quadrature points per qq, but since we can use the same set of quadrature points we only have to evaluate pp on this one set, which is more efficient in the end. It also means that we can use the ‘simple’ way of computing quadrature rules outlined in the section ‘quadrature rules, the easy way’.

References

[1] Approximation Theory. Walter Groenevelt. Lecture notes WI4415, TU Delft, 2016.

[2] Matrix-free weighted quadrature for a computationally efficient isogeometric k-method. Giancarlo Sangalli and Mattia Tani, 2017. https://arxiv.org/abs/1712.08565