01. Matrix Computation

Matrix Algebra

Matrix — The mother of all data structures. The nonmathematical uses of the word matrix reflect its Latin origins in mater, or mother… The word has two meanings — a representation of a linear mapping and the basis for all our existence.

Linear Systems

Linear algebra는 $A x = b$ 형태의 the system of linear equations에 대한 성질을 탐구한다.

이 $A x = b$ 는 row picture로는 n개의 plane에 대한 intersection이며, column picture로는 A의 column vectors들의 조합으로 볼 수 있다. 일반적으로는 column picture로써 문제를 주로 바라본다.

Vector Products

두 개의 Vector를 가정하자.

x = x_{1} x_{2} x_{3} y = y_{1} y_{2} y_{3}

일반적으로 vector는 column vector를 의미한다.

Inner product (dot product, 내적) : scalar

x^{T} y = [x_{1} x_{2} x_{3}] y_{1} y_{2} y_{3} = x_{1} y_{1} + x_{2} y_{2} + x_{3} y_{3} = i = 1 \sum 3 x_{i} y_{i} = y^{T} x

Outer product (외적) : matrix

x y^{T} = x_{1} x_{2} x_{3} [y_{1} y_{2} y_{3}] = x_{1} y_{1} x_{2} y_{1} x_{3} y_{1} x_{1} y_{2} x_{2} y_{2} x_{3} y_{2} x_{1} y_{3} x_{2} y_{3} x_{3} y_{3}

Elementwise product (원소곱) : vector

x ⊙ y = x_{1} x_{2} x_{3} ⊙ y_{1} y_{2} y_{3} = x_{1} y_{1} x_{2} y_{2} x_{3} y_{3}

Matrix Multiplication

$A \in R^{m \times p}$ , $B \in R^{p \times n}$ 라고 하자. 이 때, $C = A B$ 는 다음과 같다.

c_{ij} = k = 1 \sum p a_{ik} b_{k j} = A (i, :) B (:, j)

모르는 사람이 없을 공식인데, 기본적으로 A의 row vector를 원소로 가지는 vector와 B의 column vector를 원소로 가지는 vector에 대해서, 원소 간 곱을 inner product라고 했을 때의 outer product로 계산된다.

또한, 다음과 같이 표현할 수도 있다.

C = A B = k = 1 \sum p A (:, k) B (k, :)

즉, A의 column vector를 원소로 가지는 vector와 B의 row vector를 원소로 가지는 vector에 대해서, 원소 간 곱을 outer product라고 했을 때의 inner product로 계산된다.

Matrix Multiplication을 효율적이고 빠르게 하는 것이 학습 속도를 결정하기 때문에 이 부분을 열심히 파보는 것도 좋을 것 같다.

Determinant and Positive Definite

Determinant of a Matrix

A의 determinant는 A의 row vector들로 표현된 $n$ -dimensional space 상의 parallelepiped $P$ 의 부피와 같다.

아마 Matrix를 하나의 값으로 표현한다면 가장 흔하게 사용될 값이 바로 determinant 이다. (Determinant 값이 음수라면 공간의 방향(orientation)이 뒤집힌다는 의미이다.)

Determinant와 관련된 공식은 많지만, 아래 정도만 기억해도 고차원 수학을 다룰 예정이 아니라면 별 문제는 없었던 것 같다.

A matrix $A$ has an inverse matrix $A^{- 1}$ if and only if $d e t (A) \neq = 0$
If $A$ is triangular, then $d e t (A) = a_{11} a_{22} ... a_{nn}$ . In Particular, $d e t (I_{n}) = 1$ .
$d e t (A B) = d e t (A) d e t (B)$
$t r (A B) = t r (B A)$

간혹 invese matrix를 프로그램이나 알고리즘 내에서 직접 explicit하게 계산하도록 코드를 구현하는 사람들이 있는데, 높은 확률로 뻗어버릴테니 꼭 피하길 바란다.

Symmetric Positive Definite (SPD) Matrix

Symmetric Positive Definite(SPD) 는 이후 다룰 Optimization 내용에서 중요하게 사용되는 성질이다.

SPD의 정의는 다음과 같다.

Symmetric: $A = A^{T}$
Positive Definite (or positive semi-definite): if $x^{T} A x > 0$ (or $x^{T} A x \geq 0$ ) for all nonzero $x \in R^{n}$ , denoted by $A ≻ 0$ (or $A ⪰ 0$ ).

만약 $C \in R^{n \times n}$ 가 full rank를 가지고 $A = C^{T} C$ 이면, $A$ 는 SPD이다.

x^{T} A x = x^{T} C^{T} C x = ∥ C x ∥^{2} > 0

참고로 Covariance Matrix는 SPD이다.

C = \frac{1}{N - 1} X^{T} X = \frac{1}{N - 1} j = 1 \sum N x_{j} x_{j}^{T}

where $x_{i} = [x_{i 1} ... x_{i p}]^{T}$ .

The Cholesky Factorization

The Cholesky Factorization는 SPD matrix가 갖는 중요한 성질로, 모든 SPD는 positive diagonal entry를 갖는 upper-triangular matrix로 unique하게 분해된다.

Theorem: Cholesky factorization Every SPD matrix $A = (a_{ij}) \in R^{n \times n}$ has a uniqe Cholesky factorization
$A = R^{T} R, r_{ii} > 0$
where $R = (r_{ij})$ is an $n \times n$ upper-triangular matrix with positive diagonal entries.

위 $R$ 을 $A^{\frac{1}{2}}$ 로 표현하기도 한다.

Tests for Positive Definiteness

어떤 Matrix가 Positive Definite인지 판별하는 방법은 다음과 같 은 것들이 있다.

All the eigenvalues of $A$ satisfy $λ_{i} > 0$ .
All the upper left submatrices $A_{k}$ have positive determinants.
$2 \times 2$ -matrix $[a b b c]$ is positive definite when $a > 0$ and $a c - b^{2} > 0$ .

Linear Algebra

Linear Dependency and Basis

The vectors $v_{1}, v_{2}, ..., v_{k}$ 에 대해 $c_{1} v_{1} + ... + c_{k} v_{k} = 0$ 을 만족하는 조건이 오직 $c_{1} = ... = c_{k} = 0$ 이면, 이는 linearly independent 이다. (반대는 linearly dependent)
- 만약 $v_{i}$ 들이 linearly dependent하면, $v_{i}$ 들 중 하나( $v_{k}$ )를 나머지 vector들 $(v_{1}, \dots, v_{k - 1}, v_{k + 1}, \dots, v_{n})$ 의 linear combination으로 표현할 수 있다.
어떤 vector space $V$ 에 대해, $V$ 내 모든 vector $v$ 를 $v_{i}$ 들의 linear combination들로 표현할 수 있는 경우, $v_{i}$ 들이 $V$ 를 생성(span)한다고 말한다.
만약 다음 조건들이 만족되는 경우 ${v_{i}}$ 를 $V$ 의 basis 라고 한다.
1. $v_{i}$ ’s are linearly independent.
2. ${v_{i}}$ spans the space $V$ .
Vector space $V$ 의 basis를 구성하는 vector의 수를 $V$ 의 dimension 이라고 한다.

Norms

Let $S$ be a vector space with elements $x$ .
이 때, 다음 조건들을 만족하는 real-valued function $∥ x ∥$ 을 norm 이라고 한다:
1. $∥ x ∥ \geq 0$ for any $x \in S$
2. $∥ x ∥ = 0$ if and only if $x = 0$
3. $∥ α x ∥ = ∣ α ∣∥ x ∥$ , where $α$ is an arbitrary scalar
4. $∥ x + y ∥ \leq ∥ x ∥ + ∥ y ∥$ (triangular inequality)

새로운 Norm을 만들 때, triangular inequality를 만족하는지 꼭 체크해야한다.

Vector Norms

Vector $p$ -norm: $∥ x ∥_{p} = (\sum_{i = 1}^{n} ∣ x_{i} ∣^{p})^{1/ p}$
Manhattan: $∥ x ∥_{1} = \sum_{1 \leq i \leq n} ∣ x_{i} ∣$
Euclidian: $∥ x ∥_{2} = x^{T} x$
Chebyshev: $∥ x ∥_{\infty} = max_{1 \leq i \leq n} ∣ x_{i} ∣$

Matrix Norms

Matrix $p$ -norm

∥ A ∥_{p} = x \neq = 0 sup \frac{∥ A x ∥ _{p}}{∥ x ∥ _{p}}

Frobenius norm

∥ A ∥_{F} = (i = 1 \sum m j = 1 \sum n ∣ a_{ij} ∣^{2})^{1/2} = t r (A^{T} A)

Matrix Operation on Vectors

Linear Transformations

만약 특정 공간의 basis들에 대한 linear transformation ( $A x_{i}$ ) 를 안다면, 우리는 그 공간 전체에 대한 linear transformation을 알 수 있다.

Linearity: If $x = c_{1} x_{1} + ... + c_{n} x_{n}$ , then $A x = c_{1} (A x_{1}) + ... + c_{n} (A x_{n})$ .

자주 사용되는 linear transformation으로는 Scaling, Rotation, Identity, Projection, Reflection 등이 있다.

Projection Using Inner Products

WANT: project $x$ to $a$ .

$p = (x^{T} a) a = (a^{T} x) a = a (a^{T} x) = (aa T) x = P_{a} x$
$P_{a} = a a^{T}$ if $∥ a ∥ = a^{T} a = 1$
$P_{a} = \frac{a a ^{T}}{a ^{T} a}$ in general
- 이 때, $P_{a}$ 를 projection matrix 라고 한다.

Least Squares

Least Squares Solution

Theorem: Least Squares Solution The least squares solution to :
$min ∥ A x - b ∥ A \in R^{m \times n}, m > n$
satisfies the following normal equation : $A^{T} A \overset{x}{ˉ} = A^{T} b$

Least square 문제는 아래 figure에서 볼 수 있듯이 Ax 위로의 b의 projection 문제와 동일하다.

이는 앞으로 나올 수많은 dimension reduction 기법의 가장 기초가 된다.

If $A^{T} A$ is invertible, then $\overset{x}{ˉ} = (A^{T} A)^{- 1} A^{T} b$
If $p$ is the projection of b onto the column space of $A$ , then $p = A \overset{x}{ˉ} = P b = A (A^{T} A)^{- 1} A^{T} b$ ,
where $P$ is an orthogonal projection matrix given by $A (A^{T} A)^{- 1} A^{T}$
$P \in R^{n \times n}$ is said to be a projection if $P^{2} = P$ .
$P \in R^{n \times n}$ is an orthogonal projection if $P^{2} = P$ and $P = P^{T}$ .

Orthogonal Matrix

Matrix $Q$ 의 column과 row vector들이 orthogonal unit vectors (orthonormal vectors)이면, i.e. $Q^{T} Q = Q Q^{T} = I$ , 이 때의 $Q$ 를 orthogonal matrix라고 한다.

Orthogonal matrix는 다음과 같은 좋은 성질을 가진다.

$Q^{T} = Q^{- 1}$
$∥ Q x ∥ = ∥ x ∥$
$(Q x)^{T} (Q y) = x^{T} y$

Orthogonal matrix를 이용한 transformation은 lengths와 inner products를 보존한다.

Theorem: Orthogonal Matrix If the columns of $Q_{r} = [q_{1}, ..., q_{r}] \in R^{n \times r}$ are an orthonormal basis for a subspace $S$ , then the least squares problem $min ∥ Q_{r} x - b ∥$ becomes easy
$Q_{r}^{T} Q_{r} \overset{x}{ˉ} = Q_{r}^{T} b \Rightarrow \overset{x}{ˉ} = Q_{r}^{T} b .$
The projection of $b$ and the unique orthogonal projection matrix onto the column space $S = s p an {q_{1}, ..., q_{r}}$ is
$p = P_{s} b = Q_{r} \overset{x}{ˉ} = Q_{r} Q_{r}^{T} b, P_{s} = Q_{r} Q_{r}^{T} = i = 1 \sum r q_{i} q_{i}^{T}$

만약 $Q = [q_{1}, ..., q_{n}] \in R^{n \times n}$ 의 column들이 orthonormal basis이면, $b$ 는 다음과 같이 쓸 수 있다.

b = x_{1} q_{1} + ... + x_{n} q_{n} = Q x, x = Q^{T} b

\Rightarrow b = Q Q^{T} b = (q_{1}^{T} b) q_{1} + ... (q_{n}^{T} b) q_{n}

어떤 vector를 다른 basis로 표현하는 기법으로, 이 역시 dimension reduction을 포함한 feature transformation 기법의 기초가 된다.

Roh Donghyun

Explorer