Theory of Unconstained Optimization
Optimality Conditions
Lemma: Suppose that is differentiable at . If there is a vector such that , then is a descent direction of at .
- sketch of proof: for .
์ถํ ๋์ฌ Gradient Descent ๋๋ SGD์ ๊ธฐ๋ณธ ์๋ฆฌ์ด๋ค.
Theorem: First-Order Necessary Optimality Conditions (FONC) Suppose is differentiable at . If is a local minimum, then .
Theorem: Second-Order Optimality Conditions Suppose is twice differentiable at .
[Necessary] If is a local minimum, then and is positive semidefinite.
[Sufficient] If and is positive definite, then is a strict local minimum.
Determining Local Optima
-
Find the critical points of by solving the system of simultneous equations: .
-
Let .
-
Then
-
and implies that has a local maximum at the point .
-
and implies that has a local minimum at the point .
-
implies that has neither a local maximum nor a local minimum at the point , it has instead a saddle point.
-
implies that the test is inconclusive, so some other technique must be used to solve the problem.
-
Line Search Strategy
Line Search
Line search๋ numerical analysis์์ (๊ทผ์ฌ) ํด๋ฅผ ์ฐพ๋ ๊ธฐ๋ฒ์ผ๋ก ๋ณต์กํ diffrentiable function์ ๋ํด ์ ์ ํ๊ฒ ์ฌ์ฉ๋ ์ ์๋ค.
๊ธฐ๋ณธ์ ์ผ๋ก ์ฃผ์ด์ง ์ ์์ search direction ๋ฅผ ๊ณ์ฐํ๊ณ , ํด๋น ๋ฐฉํฅ์ผ๋ก positive scalar์ธ step length ๋งํผ ์ด๋ํ์ฌ ์๋ก์ด ์ ์ ์ฐพ๋๋ค.
๋ฐ๋ผ์, line search method๋ ์ ์ ํ search direction๊ณผ step length๋ฅผ ์ ํํ๋ ๊ฒ์ด ์ค์ํ๋ค.
Step length๋ ์ผ๋ฐ์ ์ธ learning algorithm์์ learning rate์ ๊ฐ๋ค.
๋ descent direction, i.e. , ์ผ๋ก ์ ํ๋ ๊ฒ์ด ํฉ๋ฆฌ์ ์ด๋ฉฐ ๋ง์ line search methods์์, ๋ ๋ค์๊ณผ ๊ฐ์ form์ ๊ฐ๋๋ค.
where is a symmetric and nonsingular matrix.
The Wolfe Conditions
The Wolfe condition์ inexact line search ๋ฐฉ๋ฒ์์ step length๋ฅผ ๊ฒฐ์ ํ๊ธฐ ์ํ ๊ธฐ์ค์ ์ ๊ณตํ๋ค.
- [Armijo condition : ]
์๋ก์ด ์ ์์์ ํจ์ ๊ฐ์ด ํ์ฌ ์ ์์์ ๊ฐ๋ณด๋ค ์ถฉ๋ถํ ๊ฐ์ํด์ผ ํ๋ค. - [curvature condition : ]
์๋ก์ด ์ ์์์ ๊ฒฝ์ฌ๊ฐ ์๋ ๊ฒฝ์ฌ์ ๋ฐฐ ์ด์์ด์ด์ผ ํ๋ค. (๋๋ฌด ๋ฉ๋ฆฌ ๊ฐ์ง ์๊ณ , ๋๋ฌด ํํํ ๊ตฌ์ญ์ ์์นํ์ง ์๋๋ก ํ๋ค.)
Line Search Methods
- Steepest Descent
- The rate-of-convergence is linear.
- Global convergence if satisfies the Wolfe conditions.
์ฐ๋ฆฌ๊ฐ ํํ ๋งํ๋ Gradient Descent ๋ฐฉ๋ฒ์ด๋ค.
-
Quasi-Newton Method
- The rate-of-convergence is superlinear.
- Global convergence if satisfies the Wolfe conditions.
- The BFGS method is the most popular.
-
Newtonโs Method
- The rate-of-convergence is quadrtic.
- Local convergence.
