Theory of Unconstained Optimization

Optimality Conditions

Lemma: Suppose that is differentiable at . If there is a vector such that , then is a descent direction of at .

  • sketch of proof: for .

์ถ”ํ›„ ๋‚˜์˜ฌ Gradient Descent ๋˜๋Š” SGD์˜ ๊ธฐ๋ณธ ์›๋ฆฌ์ด๋‹ค.

Theorem: First-Order Necessary Optimality Conditions (FONC) Suppose is differentiable at . If is a local minimum, then .

Theorem: Second-Order Optimality Conditions Suppose is twice differentiable at .
[Necessary] If is a local minimum, then and is positive semidefinite.
[Sufficient] If and is positive definite, then is a strict local minimum.

Determining Local Optima

  1. Find the critical points of by solving the system of simultneous equations: .

  2. Let .

  3. Then

    1. and implies that has a local maximum at the point .

    2. and implies that has a local minimum at the point .

    3. implies that has neither a local maximum nor a local minimum at the point , it has instead a saddle point.

    4. implies that the test is inconclusive, so some other technique must be used to solve the problem.

Line Search Strategy

Line search๋Š” numerical analysis์—์„œ (๊ทผ์‚ฌ) ํ•ด๋ฅผ ์ฐพ๋Š” ๊ธฐ๋ฒ•์œผ๋กœ ๋ณต์žกํ•œ diffrentiable function์— ๋Œ€ํ•ด ์ ์ ˆํ•˜๊ฒŒ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค.

๊ธฐ๋ณธ์ ์œผ๋กœ ์ฃผ์–ด์ง„ ์  ์—์„œ search direction ๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ํ•ด๋‹น ๋ฐฉํ–ฅ์œผ๋กœ positive scalar์ธ step length ๋งŒํผ ์ด๋™ํ•˜์—ฌ ์ƒˆ๋กœ์šด ์  ์„ ์ฐพ๋Š”๋‹ค.

๋”ฐ๋ผ์„œ, line search method๋Š” ์ ์ ˆํ•œ search direction๊ณผ step length๋ฅผ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค.

Step length๋Š” ์ผ๋ฐ˜์ ์ธ learning algorithm์—์„œ learning rate์™€ ๊ฐ™๋‹ค.

๋Š” descent direction, i.e. , ์œผ๋กœ ์ •ํ•˜๋Š” ๊ฒƒ์ด ํ•ฉ๋ฆฌ์ ์ด๋ฉฐ ๋งŽ์€ line search methods์—์„œ, ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ form์„ ๊ฐ–๋Š”๋‹ค.

where is a symmetric and nonsingular matrix.

The Wolfe Conditions

The Wolfe condition์€ inexact line search ๋ฐฉ๋ฒ•์—์„œ step length๋ฅผ ๊ฒฐ์ •ํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ์ค€์„ ์ œ๊ณตํ•œ๋‹ค.

  • [Armijo condition : ]
    ์ƒˆ๋กœ์šด ์ ์—์„œ์˜ ํ•จ์ˆ˜ ๊ฐ’์ด ํ˜„์žฌ ์ ์—์„œ์˜ ๊ฐ’๋ณด๋‹ค ์ถฉ๋ถ„ํžˆ ๊ฐ์†Œํ•ด์•ผ ํ•œ๋‹ค.
  • [curvature condition : ]
    ์ƒˆ๋กœ์šด ์ ์—์„œ์˜ ๊ฒฝ์‚ฌ๊ฐ€ ์›๋ž˜ ๊ฒฝ์‚ฌ์˜ ๋ฐฐ ์ด์ƒ์ด์–ด์•ผ ํ•œ๋‹ค. (๋„ˆ๋ฌด ๋ฉ€๋ฆฌ ๊ฐ€์ง€ ์•Š๊ณ , ๋„ˆ๋ฌด ํ‰ํ‰ํ•œ ๊ตฌ์—ญ์— ์œ„์น˜ํ•˜์ง€ ์•Š๋„๋ก ํ•œ๋‹ค.)

Line Search Methods

  • Steepest Descent
    • The rate-of-convergence is linear.
    • Global convergence if satisfies the Wolfe conditions.

์šฐ๋ฆฌ๊ฐ€ ํ”ํžˆ ๋งํ•˜๋Š” Gradient Descent ๋ฐฉ๋ฒ•์ด๋‹ค.

  • Quasi-Newton Method

    • The rate-of-convergence is superlinear.
    • Global convergence if satisfies the Wolfe conditions.
    • The BFGS method is the most popular.
  • Newtonโ€™s Method

    • The rate-of-convergence is quadrtic.
    • Local convergence.