Calculating the pseudoinverse

Defining the pseudoinverse

Our journey starts with the following beautiful theorem about the invertibility of the linear operators corresponding to matrices.

Theorem: Let $A \in C^{m \times n}$ be a matrix. The function $f (x) = A x$ is an invertible mapping from $\text{col}(A^) $t o$ \text{col}(A)$.*

Note: When $A \in R^{m \times n}$ has real entries, $\text{col}(A^) = \text{row}(A)$.*

Proof: It suffices to show that $f$ is injective and surjective. It is easy to see that $f$ is surjective: If $y \in col (A)$ , then, by definition, we can write $y = \sum_{k = 0}^{n - 1} c_{k} a_{k}$ where $c_{0}, c_{1}, ..., c_{n - 1} \in C$ are coefficients and $a_{0}, a_{1}, ..., a_{n - 1} \in C^{m}$ are the columns of $A$ . So letting $x = (c_{0}, c_{1}, ..., c_{n - 1})^{T} \in C^{m}$ we see that $A x = y$ .

To show that $f$ is injective we need to show that when $x_{1}, x_{2} \in col (A^{*})$ and $A x_{1} = A x_{2}$ implies $x_{1} = x_{2}$ . Suppose that $x_{1}, x_{2} \in col (A^{*})$ and $A x_{1} = A x_{2}$ . Let $y = x_{1} - x_{2}$ . Then $y$ is a linear combination of vectors in $col (A^{*})$ , so $y \in col (A^{*})$ . Now since $A x_{1} = A x_{2}$ it follows that $A (x_{1} - x_{2}) = A y = 0$ . The $k$ th entry of $A y$ is the inner product of the $k$ th column of $A^{*}$ and $y$ . Since the inner products are all zero, $y \in col (A^{*})^{⊥}$ by definition. Since both $y \in col (A^{*})$ and $y \in col (A^{*})^{⊥}$ it follows that $y = 0$ , so $x_{1} = x_{2}$ . $□$

Of course, when the matrix $A$ is invertible, the inverse of $f (x) = A x$ is just $f^{- 1} (y) = A^{- 1} y$ . However, this is the least interesting case: The remarkable thing about the theorem is that $f$ is invertible, even when $A$ is not invertible. Of course, the trick is the restriction of the domain and range of $f$ , as the next example illustrates.

Example: Suppose that $A$ is the null matrix. Then $f (x) = 0$ . It seems that this $f$ is not invertible. However, the column space of $A$ and $A^{*}$ are both equal to ${0}$ . Since $f$ is a function from $col (A^{*})$ to $col (A)$ , the inveres is $f^{- 1} (y) = 0$ .

Now let's consider an example that is slightly more interesting.

Example: Consider the matrix $A = (22 - 1 - 1)$ . This is clearly not an invertible matrix. The column space of $A$ is ${λ (11) : λ \in C}$ and the column space of $A^{*}$ is ${λ (2 - 1) : λ \in C}$ . We see that $(22 - 1 - 1) \cdot λ (2 - 1) = λ (55)$ . The inverse mapping is then $f^{- 1} (λ (11)) = \frac{λ}{5} (2 - 1)$ .

So, for a given matrix $A \in C^{m \times n}$ the inverse $f^{- 1}$ of $f (x) = A x$ exists. However, the inverse $f^{- 1} (x)$ is only defined on $col (A)$ . We would like to find the pseudoinverse $A^{†}$ that is an extension of $f^{- 1}$ to all of $C^{n}$ .

For $y \in col (A)$ , we have

A^{†} y = f^{- 1} (y) for y \in col (A)

This defines $A^{†}$ for $y \in col (A)$ , but in order to uniquely define $A^{†}$ we need to define it on $C^{m} - col (A) = col (A)^{⊥}$ as well. For $y \in col (A)$ , the simplest condition would be

A^{†} y = 0 for y \in col (A)^{⊥}

Now, we can write any $y \in C^{m}$ as $y = y_{c} + y_{c^{⊥}}$ with $y_{c} \in col (A)$ and $y_{c^{⊥}} \in col (A)^{⊥}$ . So we can evaluate $A^{⊥} y$ as $A^{†} (y_{c} + y_{c^{⊥}}) = A^{†} y_{c} + A^{†} y_{c^{⊥}} = A^{†} y_{c}$ . This means these two conditions indeed uniquely define a linear operator $A^{†}$ .

Multiplying the first condition by $A$ on the left makes it a bit nicer. Now, we can define the pseudoinverse:

Definition: Let $A \in C^{m \times n}$ be a matrix. The pseudoinverse $A^{†}$ is the matrix defined by

A A^{†} y = y A^{†} y = 0 for y \in col (A) for y \in col (A)^{⊥}

With this definition, we are ready to prove some results that help us understand why the pseudoinverse is such a useful tool to 'solve' linear systems that do not have an exact solution.

Definition: For any vector $x \in C^{n}$ , the Euclidean norm of $x$ , denoted $∣∣ x ∣ ∣_{2}$ , is defined by

∣∣ x ∣ ∣_{2} = k = 0 \sum n ∣ x_{k} ∣^{2}

Theorem: Let $A \in C^{m \times n}$ be a matrix. Then $x = A^{†} b$ minimizes $∣∣ A x - b ∣ ∣_{2}$ . Moreover, over all minimizers of $∣∣ A x - b ∣ ∣_{2}$ , $x = A^{†} b$ is the one that minimizes $∣∣ x ∣ ∣_{2}$ .

Proof: Consider $∣∣ A x - b ∣ ∣_{2}$ . Write $b = b_{A} + b_{⊥}$ with $b_{A} \in col (A)$ , $b_{⊥} \in col (A)^{⊥}$ . Then $∣∣ A x - b ∣ ∣_{2} = ∣∣ A x - b_{A} - b_{⊥} ∣ ∣_{2}$ . Since $A x, b_{A} \in col (A)$ and $b_{⊥} \in col (A)^{⊥}$ , $A x - b_{A}$ and $b_{⊥}$ are orthogonal to each other so we have $∣∣ A x - b ∣ ∣_{2} = ∣∣ A x - b_{A} ∣ ∣_{2}^{2} + ∣∣ b_{⊥} ∣ ∣_{2}^{2}$ . In particular, we have $∣∣ A x - b ∣ ∣_{2} \geq ∣∣ b_{⊥} ∣ ∣_{2}$ .

Moreover, if $x = A^{†} b$ then $x = A^{†} (b_{A} + b_{⊥}) = A^{†} b_{A}$ because $A^{†} b_{⊥} = 0$ by definition of the pseudoinverse and since $b_{⊥} \in col (A)^{⊥}$ . Then $∣∣ A x - b ∣ ∣_{2} = ∣∣ A x - b_{A} ∣ ∣_{2}^{2} + ∣∣ b_{⊥} ∣ ∣_{2}^{2}$ . We have $A A^{†} b_{A} = b_{A}$ so this reduces to $∣∣ b_{⊥} ∣ ∣_{2}$ so it follows that $∣∣ A x - b ∣ ∣_{2} = ∣∣ b_{⊥} ∣∣$ when $x = A^{†} b$ . Since $∣∣ A x - b ∣∣ \geq ∣∣ b_{⊥} ∣ ∣_{2}$ , we see that $x = A^{†} b$ minimizes $∣∣ A x - b ∣ ∣_{2}$ .

Now, $∣∣ A x - b ∣ ∣_{2}$ is minimized by any $x$ of the form $x = A^{†} b + x_{⊥}$ where $x_{⊥} \in col (A)^{⊥}$ . From the definition of the pseudoinverse it follows that $A^{†} b \in col (A)$ , so $A^{†} b$ and $x_{⊥}$ are orthogonal, so $∣∣ x ∣ ∣_{2} = ∣∣ A^{†} b ∣ ∣_{2}^{2} + ∣∣ x_{⊥} ∣ ∣_{2}^{2}$ . It follows that $∣∣ x ∣ ∣_{2}$ is minimized when $x_{⊥} = 0$ , so when $x = A^{†} b$ . $□$

Properties of the pseudoinverse

Lemma: Let $A^{†} \in C^{n \times m}$ be the pseudoinverse of $A \in C^{m \times n}$ . Then $A A^{†}$ is an orthogonal projection onto $col (A)$ and $A^{†} A$ is an orthogonal projection onto $\text{col}(A^)$.*

Proof: By definition of the pseudoinverse clearly $A^{†} A$ is the orthogonal projection onto $col (A)$ . Now take any $x \in C^{n}$ . Write $x = x_{A} + x_{⊥}$ with $x_{A} \in col (A^{*})$ and $x_{⊥} \in col (A^{*})^{⊥}$ . Then $A x = A (x_{A} + x_{⊥}) = A x_{A} + A x_{⊥} = A x_{A}$ and then it follows from the definition of the pseudoinverse that $A^{†} A x = x_{A}$ . So $A^{†} A$ is a projection onto $col (A^{*})$ . $□$

Now we consider some cases for which the pseudoinverse is easy to calculate.

Lemma: If $A \in C^{n \times n}$ is invertible, then $A^{†} = A^{- 1}$ .

Proof: We simply check that the inverse satisfies the defining properties of the pseudoinverse. Since $A$ is invertible we have $col (A) = C^{n}$ and $col (A)^{⊥} = {0}$ . We have $A A^{- 1} y = y$ for $y \in col (A) = C^{n}$ by the definition of the inverse. We also have $A^{- 1} y = 0$ for all $y \in col (A)^{⊥} = {0}$ . $□$

Lemma: If $A \in C^{m \times n}$ has orthonormal rows or orthonormal columns, then $A^\dagger = A^$.*

Proof: Again we simply check if the defining properties of the pseudoinverse are satisfied. If $A$ has orthonormal rows or orthonormal columns, then $A A^{*}$ is an orthogonal projection onto $col (A)$ , so $A A^{*} y = y$ for $y \in col (A)$ and $A^{*} y = 0$ for $y \in col (A)^{⊥}$ . $□$

Lemma: Let $A \in C^{n \times n}$ . Then

$(A^{*})^{†} = (A^{†})^{*}$
$(\overline{A})^{†} = \overline{A^{†}}$
$(A^{T})^{†} = (A^{†})^{T}$

Proof: To show that $(A^{*})^{†} = (A^{†})^{*}$ we consider the inner products $⟨ (A^{*})^{†} x, y ⟩$ and $⟨ (A^{†})^{*} x, y ⟩$ for arbitrary vectors $x \in C^{m}$ and $y \in C^{n}$ and show that they are the same.

Write $x = A^{*} x_{A} + x_{⊥}$ with $x_{A} \in row (A^{*})$ and $x_{⊥} \in col (A^{*})^{⊥}$ and $y = A y_{A} + y_{⊥}$ with $y_{A} \in row (A)$ and $y_{⊥} \in col (A)^{⊥}$ .

Then $⟨ (A^{*})^{†} x, y ⟩ = ⟨ (A^{*})^{†} (A^{*} x_{A} + x_{⊥}), y ⟩ = ⟨ (A^{*})^{†} A^{*} x_{A} + (A^{*})^{†} x_{⊥}, y ⟩ = ⟨ (A^{*})^{†} A^{*} x_{A}, y ⟩ = ⟨ x_{A}, y ⟩ = ⟨ x_{A}, A y_{A} + y_{⊥} ⟩ = ⟨ x_{A}, A y_{A} ⟩ + ⟨ x_{A}, y_{⊥} ⟩ = ⟨ x_{A}, A y_{A} ⟩$ and $⟨ (A^{†})^{*} x, y ⟩ = ⟨ x, A^{†} y ⟩ = ⟨ x, A^{†} (A y_{A} + y_{⊥}) ⟩ = ⟨ x, A^{†} A y_{A} + A^{†} y_{⊥} ⟩ = ⟨ x, y_{A} ⟩ = ⟨ A^{*} x_{A} + x_{⊥}, y_{A} ⟩ = ⟨ A^{*} x_{A}, y_{A} ⟩ + ⟨ x_{⊥}, y_{A} ⟩ = ⟨ x_{A}, A y_{A} ⟩$ .

So $(A^{*})^{†} = (A^{†})^{*}$ .

To see that $(\overline{A})^{†} = \overline{A^{†}}$ we take the defining properties of the pseudoinverse for $A$ , and take the complex conjugate. From this we see that $\overline{A} \overline{A^{†}} y = y$ for $y \in col (\overline{A})$ and $\overline{A^{†}} y = 0$ for $y \in col (\overline{A})^{⊥}$ . So $(\overline{A})^{†} = \overline{A^{†}}$ .

To see that $(A^{T})^{†} = (A^{†})^{T}$ note that $(A^{T})^{†} = (\overline{A^{*}})^{†} = \overline{(A^{†})^{*}} = (A^{†})^{T}$ . $□$

Lemma: If $A \in C^{m \times r}$ has orthonormal rows or $B \in C^{r \times n}$ has orthonormal columns, then $(A B)^{†} = B^{†} A^{†}$ .

Proof: First consider the case that $B$ has orthonormal rows. Then $B B^{*} = I$ , and from lemma TODO we know that $B^{†} = B^{*}$ . Yet again, we check the defining properties of the pseudoinverse.

Using $col (A B) = col (A)$ , $B^{†} = B^{*}$ , and $B^{*} B = I$ , the first property now reduces to $A A^{†} y = y$ for $y \in col (A)$ , and the second property reduces to $A^{†} A x = 0$ for $y \in col (A)^{⊥}$ . These are true by definition of the pseudoinverse of $A$ . $□$

Calculating the pseudoinverse

Now, we want to calculate the pseudoinverse. From theorem TODO, we know that there's an invertible matrix "hidden inside" any matrix $A \in C^{m \times n}$ .

Now, the idea is to do a change of basis so that instead of operating on $C^{n}$ , $A$ operates on a basis of $col (A^{*})$ , and maps to a basis of $col (A)$ . It will be convenient to use orthonormal bases here, since this will simplify computation.

Call the matrix after the change-of-basis $M$ . We change the basis of the range to an orthonormal basis of $col (A)$ and the basis of the domain to an orthonormal basis of $col (A^{*})$ .

For the range, we calculate the change-of-basis matrix $U$ for which the columns form an orthonormal basis for $col (A)$ . For the domain we calculate a change-of-basis matrix $V$ for which the columns form an orthonormal basis of $col (A^{*})$ . (These matrices aren't square, strictly speaking they are projections followed by a change of basis)

We end up with the matrix $M = U^{*} A V \in C^{r \times r}$ . This matrix is an $r$ -by- $r$ matrix. Since $U^{*}$ , $A$ , $V$ all have rank $r$ , $M$ has rank $r$ as well. So $M$ is invertible.

If we take $M = U^{*} A V$ and multiply on the left by $U$ and on the right by $V^{*}$ we get $U M V^{*} = U U^{*} A V^{*} V$ . Now, it can checked easily that $V^{*} V = I_{r}$ and $U U^{*}$ is an orthogonal projection that projects onto $col (A)$ . Since $U U^{*} x = 0$ for every $x \in col (A)^{⊥}$ we have $U M V^{*} = A$ .

We have proved the following theorem:

Theorem: Any matrix $A \in C^{m \times n}$ of rank $r$ has a decomposition $A = UMV^ $, w h ere$ M \in \mathbb{C}^{r \times r} $i s in v er t ib l e, t h eco l u mn so f$ U \in \mathbb{C}^{m \times r} $f or ma or t h o n or ma l ba s i so f$ \text{col}(A) $, an d t h eco l u mn so f$ V \in \mathbb{C}^{n \times r} $f or man or t h o n or ma l ba s i so f$ \text{col}(A^)$.

Note that in general the decomposition is not unique, since an orthonormal basis of a space is not unique when the space has dimension greater than one.

It is relatively easy to compute the pseudoinverse for a matrix that is decomposed in the form of theorem 12:

Theorem: If a matrix $A \in C^{m \times n}$ satisfies $A = UMV^ $, w h ere$ M \in \mathbb{C}^{r \times r} $i s in v er t ib l e an d$ U \in \mathbb{C}^{m \times r} $an d$ V \in \mathbb{C}^{n \times r}$ have orthonormal columns, then*

A^{†} = V M^{- 1} U^{*}

Proof: We want to compute $A^{†} = (U M V^{*})^{†}$ . By lemma TODO, we have $(U M V^{*})^{†} = (V^{*})^{†} (M)^{†} (V)^{†}$ . Now, $M$ is invertible so we can use lemma TODO to see that $M^{†} = M^{- 1}$ . Also, $U$ has orthonormal columns and $V^{*}$ has orthonormal rows, so we can use lemma TODO to see that $U^{†} = U^{*}$ and $(V^{*})^{†} = V$ . It follows that $A^{†} = V M^{- 1} U^{*}$ . $□$

Special cases

In many special cases, we do not need to compute a full decomposition in the form that is used by theorem 12. We already saw some easy cases in the second section. Here, we consider some more special cases for which it is easy to calculate the pseudoinverse.

Theorem: If $P \in C^{n}$ is a projection to a space $S \subseteq C^{n}$ , and $P_{orth}$ is an orthogonal projection to the same space $S$ then

P^{†} = P_{orth}

Proof: As usual, check that the defining properties of the pseudoinverse hold. We see that $P P_{orth} y = y$ for $y \in col (P)$ and $P_{orth} y = 0$ for $y \in col^{⊥}$ , so indeed $P^{†} = P_{orth}$ . $□$

Theorem: If $A$ is full rank, the solution $x$ of

A^{*} A x = A b

equals $A^{†} b$ . Moreover, if $A$ has linearly independent rows then $A^\dagger = A^(AA^)^{-1} $. I f$ A $ha s l in e a r l y in d e p e n d e n t co l u mn s t h e n$ A^\dagger = (A^A)^{-1}A^$.

Proof: Suppose $A$ has linearly independent columns. As usual by now, we check the defining properties of the pseudoinverse hold when we substitute $A^{†} = (A^{*} A)^{- 1} A^{*}$ . If $y \in col (A)$ we can write $y = A x$ for some $x$ . It follows that $A (A^{*} A)^{- 1} A^{*} y = A (A^{*} A)^{- 1} A^{*} A x = A x = y$ for $y \in col (A)$ . If $y \in col (A)^{⊥}$ it follows that $A^{*} y = 0$ , so $(A^{*} A)^{- 1} A^{*} y = 0$ for $y \in col (A)^{⊥}$ .

If $A$ has linearly independent rows, then $A^{*}$ has linearly independent columns, so $(A^{*})^{†} = (A A^{*})^{- 1} A$ . By lemma TODO it now follows that $A^{†} = ((A A^{*})^{- 1} A)^{*} = A^{*} (A A^{*})^{- 1}$ . $□$