Calculating the pseudoinverse
Defining the pseudoinverse
Our journey starts with the following beautiful theorem about the invertibility of the linear operators corresponding to matrices.
Theorem: Let be a matrix. The function is an invertible mapping from $\text{col}(A^)\text{col}(A)$.*
Note: When has real entries, $\text{col}(A^) = \text{row}(A)$.*
Proof: It suffices to show that is injective and surjective. It is easy to see that is surjective: If , then, by definition, we can write where are coefficients and are the columns of . So letting we see that .
To show that is injective we need to show that when and implies . Suppose that and . Let . Then is a linear combination of vectors in , so . Now since it follows that . The th entry of is the inner product of the th column of and . Since the inner products are all zero, by definition. Since both and it follows that , so .
Of course, when the matrix is invertible, the inverse of is just . However, this is the least interesting case: The remarkable thing about the theorem is that is invertible, even when is not invertible. Of course, the trick is the restriction of the domain and range of , as the next example illustrates.
Example: Suppose that is the null matrix. Then . It seems that this is not invertible. However, the column space of and are both equal to . Since is a function from to , the inveres is .
Now let's consider an example that is slightly more interesting.
Example: Consider the matrix . This is clearly not an invertible matrix. The column space of is and the column space of is . We see that . The inverse mapping is then .
So, for a given matrix the inverse of exists. However, the inverse is only defined on . We would like to find the pseudoinverse that is an extension of to all of .
For , we have
This defines for , but in order to uniquely define we need to define it on as well. For , the simplest condition would be
Now, we can write any as with and . So we can evaluate as . This means these two conditions indeed uniquely define a linear operator .
Multiplying the first condition by on the left makes it a bit nicer. Now, we can define the pseudoinverse:
Definition: Let be a matrix. The pseudoinverse is the matrix defined by
With this definition, we are ready to prove some results that help us understand why the pseudoinverse is such a useful tool to 'solve' linear systems that do not have an exact solution.
Definition: For any vector , the Euclidean norm of , denoted , is defined by
Theorem: Let be a matrix. Then minimizes . Moreover, over all minimizers of , is the one that minimizes .
Proof: Consider . Write with , . Then . Since and , and are orthogonal to each other so we have . In particular, we have .
Moreover, if then because by definition of the pseudoinverse and since . Then . We have so this reduces to so it follows that when . Since , we see that minimizes .
Now, is minimized by any of the form where . From the definition of the pseudoinverse it follows that , so and are orthogonal, so . It follows that is minimized when , so when .
Properties of the pseudoinverse
Lemma: Let be the pseudoinverse of . Then is an orthogonal projection onto and is an orthogonal projection onto $\text{col}(A^)$.*
Proof: By definition of the pseudoinverse clearly is the orthogonal projection onto . Now take any . Write with and . Then and then it follows from the definition of the pseudoinverse that . So is a projection onto .
Now we consider some cases for which the pseudoinverse is easy to calculate.
Lemma: If is invertible, then .
Proof: We simply check that the inverse satisfies the defining properties of the pseudoinverse. Since is invertible we have and . We have for by the definition of the inverse. We also have for all .
Lemma: If has orthonormal rows or orthonormal columns, then $A^\dagger = A^$.*
Proof: Again we simply check if the defining properties of the pseudoinverse are satisfied. If has orthonormal rows or orthonormal columns, then is an orthogonal projection onto , so for and for .
Lemma: Let . Then
Proof: To show that we consider the inner products and for arbitrary vectors and and show that they are the same.
Write with and and with and .
Then and .
So .
To see that we take the defining properties of the pseudoinverse for , and take the complex conjugate. From this we see that for and for . So .
To see that note that .
Lemma: If has orthonormal rows or has orthonormal columns, then .
Proof: First consider the case that has orthonormal rows. Then , and from lemma TODO we know that . Yet again, we check the defining properties of the pseudoinverse.
Using , , and , the first property now reduces to for , and the second property reduces to for . These are true by definition of the pseudoinverse of .
Calculating the pseudoinverse
Now, we want to calculate the pseudoinverse. From theorem TODO, we know that there's an invertible matrix "hidden inside" any matrix .
Now, the idea is to do a change of basis so that instead of operating on , operates on a basis of , and maps to a basis of . It will be convenient to use orthonormal bases here, since this will simplify computation.
Call the matrix after the change-of-basis . We change the basis of the range to an orthonormal basis of and the basis of the domain to an orthonormal basis of .
For the range, we calculate the change-of-basis matrix for which the columns form an orthonormal basis for . For the domain we calculate a change-of-basis matrix for which the columns form an orthonormal basis of . (These matrices aren't square, strictly speaking they are projections followed by a change of basis)
We end up with the matrix . This matrix is an -by- matrix. Since , , all have rank , has rank as well. So is invertible.
If we take and multiply on the left by and on the right by we get . Now, it can checked easily that and is an orthogonal projection that projects onto . Since for every we have .
We have proved the following theorem:
Theorem: Any matrix of rank has a decomposition $A = UMV^M \in \mathbb{C}^{r \times r}U \in \mathbb{C}^{m \times r}\text{col}(A)V \in \mathbb{C}^{n \times r}\text{col}(A^)$.
Note that in general the decomposition is not unique, since an orthonormal basis of a space is not unique when the space has dimension greater than one.
It is relatively easy to compute the pseudoinverse for a matrix that is decomposed in the form of theorem 12:
Theorem: If a matrix satisfies $A = UMV^M \in \mathbb{C}^{r \times r}U \in \mathbb{C}^{m \times r}V \in \mathbb{C}^{n \times r}$ have orthonormal columns, then*
Proof: We want to compute . By lemma TODO, we have . Now, is invertible so we can use lemma TODO to see that . Also, has orthonormal columns and has orthonormal rows, so we can use lemma TODO to see that and . It follows that .
Special cases
In many special cases, we do not need to compute a full decomposition in the form that is used by theorem 12. We already saw some easy cases in the second section. Here, we consider some more special cases for which it is easy to calculate the pseudoinverse.
Theorem: If is a projection to a space , and is an orthogonal projection to the same space then
Proof: As usual, check that the defining properties of the pseudoinverse hold. We see that for and for , so indeed .
Theorem: If is full rank, the solution of
equals . Moreover, if has linearly independent rows then $A^\dagger = A^(AA^)^{-1}AA^\dagger = (A^A)^{-1}A^$.
Proof: Suppose has linearly independent columns. As usual by now, we check the defining properties of the pseudoinverse hold when we substitute . If we can write for some . It follows that for . If it follows that , so for .
If has linearly independent rows, then has linearly independent columns, so . By lemma TODO it now follows that .