# 算法之美 之 小小方差增量算法带来的大大收益

## 方差的统计学定义

$x_1, x_2, ... , x_N$

X样本的平均值计算很简单：

$\overline{X} = \frac 1 N \sum_{i=1}^N x_i$

$\sigma_X^2 = \frac 1 N \sum_{i=1}^N (x_i - \overline{X})^2$

## 增量方差的推导

$h_1, h_2, ... , h_M$

$a_1, a_2, ... , a_N$

$\overline{H} = \frac 1 M \sum_{i=1}^M h_i$

$\sigma_H^2 = \frac 1 M \sum_{i=1}^M (h_i - \overline{H})^2$

$\overline{A} = \frac 1 N \sum_{j=1}^N a_j$

$\sigma_A^2 = \frac 1 N \sum_{j=1}^N (a_j - \overline{A})^2$

$h_1, h_2, ... , h_M, a_1, a_2, ... , a_N$

\begin{align} \overline{X} &= \frac 1 {M + N} \left[ \sum_{i=1}^M h_i + \sum_{j=1}^N a_j \right] \nonumber \\\\ &= \frac { M\overline{H} + N\overline{A}} {M + N} \nonumber \end{align}

\begin{align} \sigma^2 &= \frac 1 {M + N} \left[\sum_{i=1}^M \left(h_i - \overline{X}\right)^2 + \sum_{j=1}^N \left(a_j - \overline{X}\right)^2 \right] \nonumber \\\\ &= \frac 1 {M + N} \left[ \sum_{i=1}^M \left((h_i - \overline{H}) - (\overline{X} - \overline{H})\right)^2 + \sum_{j=1}^N \left((a_j - \overline{A}) - (\overline{X} - \overline{A})\right)^2 \right] \nonumber \\\\ &= \frac 1 {M + N} [ \sum_{i=1}^M \left((h_i - \overline{H})^2 - 2(h_i - \overline{H})(\overline{X} - \overline{H}) + (\overline{X} - \overline{H})^2\right) \nonumber \\\\ & + \sum_{j=1}^N \left((a_j - \overline{A})^2 - 2(a_j - \overline{A})(\overline{X} - \overline{A}) + (\overline{X} - \overline{A})^2\right) ] \nonumber \\\\ &= \frac 1 {M + N} [ M\sigma_H^2 + M(\overline{X} - \overline{H})^2 - 2(\overline{X} - \overline{H})(\sum_{i=1}^M h_i - M\overline{H}) \nonumber \\\\ &+ N\sigma_A^2 + N(\overline{X} - \overline{A})^2 - 2(\overline{X} - \overline{A})(\sum_{j=1}^N a_j - N\overline{A}) ] \nonumber \\\\ &= \frac 1 {M + N} \left[ M\sigma_H^2 + M(\overline{X} - \overline{H})^2 + N\sigma_A^2 + N(\overline{X} - \overline{A})^2 \right] \nonumber \\\\ &= \frac { M\left[\sigma_H^2 + \left(\overline{X} - \overline{H}\right)^2\right] + N\left[\sigma_A^2 + \left(\overline{X} - \overline{A}\right)^2\right] } {M + N} \nonumber \end{align}

## 增量方差的实现

case class Measures(n: Int, sum: Double, variance: Double) {
def avg = sum / n

def appendDelta(delta: Measures): Measures = {
val newN = this.n + delta.n
val newSum = this.sum + delta.sum
val newAvg = newSum / newN

def partial(m: Measures): Double = {
val deltaAvg = newAvg - m.avg
m.n * ( m.variance + deltaAvg * deltaAvg )
}

val newVariance = (partial(this) + partial(delta)) / newN

Measures(newN, newSum, newVariance)
}
}


Measures包含了样本数，均值，和以及方差，构成了可增量计算方差的要素。同时也用它承载职责“方差增量算法”。

case class Samples(values: Seq[Double]) {
def measures: Measures = {
if (values == null || values.isEmpty)
Measures(0, 0d, 0d)
else
Measures(values.length, values.sum, variance)
}

private def variance: Double = {
val n = values.length
val avg = values.sum / n
values.foldLeft(0d) { case (sum, sample) =>
sum + (sample - avg) * (sample - avg)
} / n
}
}


Samples解决了如何计算一组样本值所需要的统计指标，按统计学定义直接计算，无增量算法。

object DeltaVarianceUtils {
def main(args: Array[String]): Unit = {
implicit val arrayToSamples = (values: Array[Double]) => Samples(values)

val historicalSamples = Array(1.5d, 3.4d, 7.8d, 11.6d)
val deltaSamples = Array(9.4d, 4.2d, 35.6d, 77.9d)

println("Variance: "
+ (historicalSamples ++ deltaSamples).measures.variance
)
println("Variance calculated by delta algorithm: "
+ historicalSamples.measures.appendDelta(deltaSamples.measures).variance
)
}
}


Variance: 598.2168750000002
Variance calculated by delta algorithm: 598.2168750000001


## 大大的收益

posted @ 2015-07-06 07:24  一码  阅读(6711)  评论(8编辑  收藏  举报