Distributions

Logistic Sigmoid

Softplus Function

  • softened version of

Properties of Sigmoid

Information Theory

Entropy

Information theory๋Š” information์ด๋ผ๋Š” ๊ฐœ๋…์— ๋Œ€ํ•œ quantifyingํ•˜๋Š” ๊ฒƒ์— ๋Œ€ํ•œ ํ•™๋ฌธ์ด๋‹ค. ์—ฌ๊ธฐ์„œ ๊ฐ€์žฅ ๊ธฐ๋ณธ ๊ฐœ๋…์€, ๋ฐœ์ƒ ํ™•๋ฅ ์ด ๋‚ฎ์€ event๊ฐ€ ๋” informativeํ•˜๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

์–ด๋–ค event ์ด ๊ฐ€์ง€๊ณ  ์žˆ๋Š” information quantity ๋˜๋Š” self-information์€ event ๊ฐ€ ๋ฐœ์ƒํ•  ํ™•๋ฅ  ๋ฅผ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•œ ์ตœ์†Œ bits ๊ฐœ์ˆ˜๋กœ, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•œ๋‹ค.

๊ฐ€ ๋‚ฎ์„์ˆ˜๋ก ๋Š” ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

์ด ๋•Œ, entropy๋ž€ self-information์˜ expectation์„ ๋งํ•˜๋ฉฐ, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

์–ด๋–ค distribution์—์„œ ๋ฐœ์ƒํ•œ event์˜ information quantity์˜ ๊ธฐ๋Œ“๊ฐ’์ด๋ฏ€๋กœ, ํ•ด๋‹น distribution์˜ uncertainty ํฌ๊ธฐ์— ๋Œ€ํ•œ measure๋กœ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, Bernoulli distribution์„ ๊ณ ๋ คํ•˜์ž. ์ด ๋•Œ์˜ entropy๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

๋งŒ์•ฝ, distribution์ด deterministicํ•˜๋‹ค๋ฉด, ์ฆ‰ ๋˜๋Š” ์ด๋ฉด, ๋Š” 0์ด ๋œ๋‹ค (์—ฌ๊ธฐ์„œ ์œผ๋กœ ๊ณ ๋ ค).

๋ฐ˜๋ฉด์—, ๊ฐ€์žฅ uncertain ํ•œ ์ƒํ™ฉ์ธ ์ธ ๊ฒฝ์šฐ, ๋Š” ์•ฝ 0.693 ์ •๋„๋กœ ๊ฐ€์žฅ ํฐ ๊ฐ’์„ ๊ฐ–๋Š”๋‹ค.

KL Divergence

Kullbackโ€“Leibler divergence, ๋˜๋Š” KL divergence๋Š” ๋‘ distribution์— ๋Œ€ํ•œ ์ฐจ์ด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋น„๋Œ€์นญ์  ์ง€ํ‘œ์ด๋‹ค.

  • if and only if and are the same.

  • Asymmetric:

Cross-entropy

์šฐ๋ฆฌ๊ฐ€ ๊ด€์‹ฌ์žˆ๋Š” ๊ฒƒ์ด data์˜ distribution ๋ผ๊ณ  ํ•˜์ž. ํ•˜์ง€๋งŒ, ์ผ๋ฐ˜์ ์œผ๋กœ ์šฐ๋ฆฌ๋Š” ์— ๋Œ€ํ•ด์„œ๋Š” ์•Œ์ง€ ๋ชปํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๋Š” ๋ฅผ ๋ชจ์‚ฌํ•˜๋Š” model์„ ๋งŒ๋“ค๊ณ  ์— ๋Œ€ํ•œ ์ถ”์ •์„ ์ง„ํ–‰ํ•œ๋‹ค. ์ด ๋•Œ, model์˜ (output์— ๋Œ€ํ•œ) distribution์„ ๋ผ๊ณ  ํ•˜์ž.

Cross-entropy๋Š” ๊ฐ€ model ์— ๋Œ€ํ•œ ์ถ”์ •์„ ์–ผ๋งˆ๋‚˜ ์ž˜ํ•˜๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด์ฃผ๋Š” ์ง€ํ‘œ๋กœ, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

Cross-entropy๋Š” ๋ฅผ ๋”ฐ๋ฅด๋Š” ์ฃผ์–ด์ง„ data์™€ ์šฐ๋ฆฌ๊ฐ€ ๊ฐ€์ •ํ•œ distribution ๋ฅผ ํ†ตํ•ด ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋Š” ์ง€ํ‘œ๋กœ, ์ด๋ฅผ ์ตœ์†Œํ™”ํ•˜๋„๋ก ํ•™์Šต์„ ์ง„ํ–‰ํ•œ๋‹ค๋ฉด, ์— ๋Œ€ํ•ด์„œ ๋” ์˜ฌ๋ฐ”๋ฅธ ์ถ”์ •์ด ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋œ๋‹ค.

Cross-entropy์˜ ์ตœ์†Œํ™” w.r.t. ๋Š” KL divergence์— ๋Œ€ํ•œ ์ตœ์†Œํ™”์™€ ๋™์ผํ•˜๋‹ค.