Matrix Inverse

Oct 12, 2020

6 min read

Mar 11, 2022 15:59 UTC

...for a nonsingular matrix. We talk about left and right inverses, *the* matrix inverse and orthogonal matrices.

Linear system

We have matrix $\boldsymbol{A}: m \times n$, and vectors $\boldsymbol{b}: m \times 1$ and $\boldsymbol{x}: n \times 1$ where $\boldsymbol{b}$ is given and $\boldsymbol{x}$ is unknown. We use the equation

$$ \boldsymbol{Ax} = \boldsymbol{b} $$

We’re interested in when does a solution $\boldsymbol{x}$ exist, and how do we find it.

A linear system is consistent when there exists at least one $\boldsymbol{x}$ satisfying the equation. We can see that the following statements are equivalent:

A linear system is consistent.
$\boldsymbol{b} \in \mathcal{C}(\boldsymbol{A})$.
$\mathcal{C}([\boldsymbol{A}, \boldsymbol{b}]) = \mathcal{C}(\boldsymbol{A})$.
$r([\boldsymbol{A}, \boldsymbol{b}]) = r(\boldsymbol{A})$.

If $\boldsymbol{A}$ has full row rank $m$, then $\mathcal{C}(\boldsymbol{A}) = \mathbb{R}^m$, and $\boldsymbol{Ax} = \boldsymbol{b}$ is consistent for any $\boldsymbol{b}$.

The next fact is important in statistical contexts. The linear system $\boldsymbol{A}’\boldsymbol{Ax} = \boldsymbol{A}’\boldsymbol{b}$ is consistent for any $\boldsymbol{A}$, $\boldsymbol{b}$. This is because $\boldsymbol{A}’\boldsymbol{b} \in \mathcal{C}(\boldsymbol{A}’) = \mathcal{C}(\boldsymbol{A}’\boldsymbol{A})$.

If $\boldsymbol{A}$ has full column rank, $\boldsymbol{A}’\boldsymbol{A}$ is nonsingular, and $\boldsymbol{x}$ is unique. This is an assumption in our regression analyses, which is there’s no redundancy in our predictor variables.

Matrix inverse

The inverse of a matrix only makes sense when the matrix is square. Before talking about it, we introduce two other concepts for non-square matrices.

Left and right inverses

If $\boldsymbol{A}: m \times n$, the right inverse $\boldsymbol{R}$ is the $n \times m$ matrix such that

$$ \boldsymbol{AR} = \boldsymbol{I}_m $$

The right inverse exists when $\boldsymbol{A}$ has full row rank, i.e. $r(\boldsymbol{A}) = m$ ($m \leq n$). This is because if $r(\boldsymbol{A}) = m$, then $\boldsymbol{Ax} = \boldsymbol{b}$ is consistent for any $\boldsymbol{b}$. Let

$$ \boldsymbol{b}_1 = \begin{pmatrix} 1 \\ 0 \\ 0 \\ \vdots \\ 0 \end{pmatrix}_{m \times 1}, \boldsymbol{b}_2 = \begin{pmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{pmatrix}, \cdots, \boldsymbol{b}_m = \begin{pmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ 1 \end{pmatrix} $$

The solution for $\boldsymbol{Ab}_1 = \boldsymbol{x}$ is $\boldsymbol{x}_1$, and similarly we have $\boldsymbol{x}_2, \cdots, \boldsymbol{x}_m$. If we put $\boldsymbol{x}_1, \cdots, \boldsymbol{x}_m$ into an $n \times m$ matrix

$$ \boldsymbol{R} = [\boldsymbol{x}_1, \cdots, \boldsymbol{x}_m], $$

we have $\boldsymbol{AR} = \boldsymbol{A}[\boldsymbol{x}_1, \cdots, \boldsymbol{x}_m] = \boldsymbol{I}_m$. From the proof we can also see that $\boldsymbol{R}$ is not unique.

Similarly, the left inverse $\boldsymbol{L}: n \times m$ is the matrix that satisfies $\boldsymbol{LA} = \boldsymbol{I}_n$. $\boldsymbol{L}$ exists when $\boldsymbol{A}$ has full column rank, i.e. $r(\boldsymbol{A}) = n (m \geq n)$.

Inverse

If $\boldsymbol{A}$ is non-singular, then its right inverse and left inverse exist, and they are the same, which we call the inverse of $\boldsymbol{A}$ denoted by $\boldsymbol{A}^{-1}$. $\boldsymbol{A}^{-1}$ is unique. To prove this, note that

$$ \boldsymbol{L} = \boldsymbol{LI}_n = \boldsymbol{LAR} = \boldsymbol{I}_n \boldsymbol{R} = \boldsymbol{R} $$

Properties

If $\boldsymbol{A}$ has full column rank $n$¹, and there exists $\boldsymbol{L}$ such that $\boldsymbol{LA} = \boldsymbol{I}_n$ ($n \leq m$), then

$\boldsymbol{B} = \boldsymbol{C} \Longleftrightarrow \boldsymbol{AB} = \boldsymbol{AC}$
This is because
$$ \boldsymbol{AB} = \boldsymbol{AC} \Rightarrow \boldsymbol{LAB} = \boldsymbol{LAC} \Rightarrow \boldsymbol{I}_n\boldsymbol{B} = \boldsymbol{I}_n \boldsymbol{C} $$
For any $\boldsymbol{B}$, $r(\boldsymbol{AB}) = r(\boldsymbol{B})$. Pre-multiplying a full column rank matrix does not change the rank.
For any $\boldsymbol{B}$, $\mathcal{R}(\boldsymbol{AB}) = \mathcal{R}(\boldsymbol{B})$ when $\boldsymbol{A}$ has full column rank.

If $\boldsymbol{A}$ has full column rank $n$, then $\boldsymbol{A}’\boldsymbol{A}$ is non-singular and vice versa. Similarly if $\boldsymbol{A}$ has full row rank, $\boldsymbol{AA}’$ is non-singular.

Inverse of non-singular matrix

Suppose $\boldsymbol{A}$ is an $m \times m$ non-singular matrix.

If $m = 2$, we have
$$ \boldsymbol{A} = \begin{pmatrix} a & b \\ c & d \end{pmatrix}, \quad \boldsymbol{A}^{-1} = \frac{1}{ad - bc} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix} $$
Check that $\boldsymbol{AA}^{-1} = \boldsymbol{I}_2$ to make sure.
If $\boldsymbol{A}$ is diagonal and all diagonals are non-zero,
$$ \boldsymbol{A}^{-1} = diag \left(d_1^{-1}, \cdots, d_m^{-1} \right) $$
$\boldsymbol{Ax} = \boldsymbol{b}$ has a unique solution $\boldsymbol{x} = \boldsymbol{A}^{-1}\boldsymbol{b}$.
$(k\boldsymbol{A})^{-1} = \frac{1}{k}\boldsymbol{A}^{-1}$.
$\boldsymbol{A}’$ is also non-singular, and its inverse is
$$ (\boldsymbol{A}’)^{-1} = (\boldsymbol{A}^{-1})' $$
This is because if we take the transpose of both sides of $\boldsymbol{AA}^{-1} = \boldsymbol{I}_m$, we get $(\boldsymbol{A}^{-1})’\boldsymbol{A}’ = \boldsymbol{I}_m$. Since $\boldsymbol{A}’$ is non-singular, $(\boldsymbol{A}^{-1})’$ is the inverse of $\boldsymbol{A}’$.
$\boldsymbol{A}^{-1}$ is also non-singular, i.e. $r(\boldsymbol{A}^{-1}) = m$. This is because
$$ m = r(\boldsymbol{AA}^{-1}) \leq r(\boldsymbol{A}^{-1}) \leq m $$
If $\boldsymbol{B}$ is also an $m \times m$ non-singular matrix, then $\boldsymbol{AB}$ is also non-singular, and $(\boldsymbol{AB})^{-1} = \boldsymbol{B}^{-1}\boldsymbol{A}^{-1}$. To check this claim,
$$ (\boldsymbol{AB})(\boldsymbol{B}^{-1}\boldsymbol{A}^{-1}) = \boldsymbol{AI}_m\boldsymbol{A}^{-1} = \boldsymbol{AA}^{-1} = \boldsymbol{I}_m $$
This could be generalized to the product of $k$ non-singular matrices: $(\boldsymbol{A}_1\boldsymbol{A}_2 \cdots \boldsymbol{A}_k)^{-1} = \boldsymbol{A}_k^{-1} \cdots \boldsymbol{A}_1^{-1}$. This is similar to the transpose of the product of matrices.
If $\boldsymbol{A}: m \times n$ and $r(\boldsymbol{A}) = r$; $\boldsymbol{B}: m \times m$ is non-singular, then $r(\boldsymbol{BA}) = r$. Similarly if $\boldsymbol{C}: n \times n$ is non-singular, $r(\boldsymbol{AC}) = r$. In fact, $\boldsymbol{B}$ only needs to have full column rank, and $\boldsymbol{C}$ only needs to have full row rank.

Orthonormal matrices

An orthogonal matrix, or orthonormal matrix to be more precise, is a square matrix whose columns and rows are orthonormal. One way to express this is

$$ \boldsymbol{AA}’ = \boldsymbol{A}’\boldsymbol{A} = \boldsymbol{I}_n $$

By definition, the columns have to be orthonormal because

$$ \begin{gathered} \boldsymbol{a}_i’\boldsymbol{a}_i = 1 = \|\boldsymbol{a}_i \|^2 \\ \boldsymbol{a}_i’\boldsymbol{a}_j = 0 \end{gathered} $$

Equivalently, $\boldsymbol{A}$ is orthogonal if $\boldsymbol{A}^{-1} = \boldsymbol{A}’$.

Examples

In general, any $2 \times 2$ orthogonal matrix $\boldsymbol{P}$ is

$$ \boldsymbol{P} = \begin{pmatrix} \sin\theta & -\cos\theta \\ \cos\theta & \sin\theta \end{pmatrix} $$

If we think of the matrix as an operator on a vector $\boldsymbol{u}$, the vector $\boldsymbol{v} = \boldsymbol{Pu}$ can be interpreted as a rotation of $\boldsymbol{u}$ by $\theta$ degrees. This is why $\boldsymbol{P}$ is sometimes called the rotation matrix. Note that this operation applies to higher-order matrices as well. For example, in PCA we can enforce zero-correlation by rotation.

Illustration of rotating a vector in $\mathbb{R}^2$ space.

The identity matrix is also obviously orthogonal. We can also permute the columns and the resulting matrix is still orthogonal. Geometrically, this is permuting the coordinate axes.

Properties

The product of two orthogonal matrices is orthogonal.
To prove this, let $\boldsymbol{P}$ and $\boldsymbol{Q}$ be two orthogonal matrices. We then have
$$ (\boldsymbol{PQ})(\boldsymbol{PQ})’ = \boldsymbol{PQ}\boldsymbol{Q}’\boldsymbol{P}’ = \boldsymbol{PP}’ = \boldsymbol{I} $$
For any vector $\boldsymbol{x} \in \mathbb{R}^n$, $\| \boldsymbol{Px} \|^2 = \|\boldsymbol{x}\|^2$. Pre-multiplying an orthogonal matrix to a vector doesn’t change the norm of the vector because
$$ (\boldsymbol{Px})’(\boldsymbol{Px}) = \boldsymbol{x}’\boldsymbol{P}’\boldsymbol{Px} = \boldsymbol{x}’\boldsymbol{x} = \|\boldsymbol{x}\|^2 $$
For any vector $\boldsymbol{x}, \boldsymbol{y} \in \mathbb{R}^n$, their inner product is not changed by pre-multiplying an orthogonal matrix
$$ (\boldsymbol{Px})’(\boldsymbol{Py}) = \boldsymbol{x}’\boldsymbol{y} $$

We can make similar arguments if $\boldsymbol{A}$ has full row rank. Just consider switching the orders. ↩︎

Aug 31	Vector Space	8 min read
Nov 04	Quadratic Form	15 min read
Oct 26	Determinant	6 min read
Oct 23	Projection Matrix	4 min read
Oct 21	Generalized Inverse	4 min read