Gini Coefficient

Gini Coefficient

1. Lorenz Curve

The Lorenz curve is used to describe the inequality in the distribution of a quantity (usually income or wealth in economics, or size or reproductive output in ecology)

The Lorenz curve is a popular method to display concentrations graphically.

Assumption:

  • \(n\) observations \(x_1, x_2, \cdots, x_n\) of a variable \(X\).
  • All the observations are positive.
  • If the data is ungrouped, The sum of all the observations is \(\sum_{i=1}^n x_i = n \bar{x}\)
  • Order the data: \(0 \leq x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}\)

Then, we can plot Lorenz curve by \(x\)-axis \(u_i\)

\[u_{i}=\frac{i}{n}, \qquad i=0, \ldots, n \]

and \(y\)-axis \(v_i\)

\[v_{i}=\frac{\sum \limits_{j=1}^{i} x_{(j)}}{\sum \limits_{j=1}^{n} x_{(j)}}, \qquad i=1, \ldots, n ; \quad \text{where} \quad v_{0}:=0 \]

where the numerator item \(\sum \limits_{j=1}^{i} x_{(j)}\) is the cumulative total of observations up to the \(i\)th observation.

Property:

  • \(v_i\) describe the contribution of all values \(≤ x_{(i)}\) in comparison with the sum of all values.
  • the point \((u_i, v_i)\) says that \(u_i\times 100 \%\) of observations contain \(v_i\times 100 \%\) of the sum of all \(x_i\) less than or equal to \(x_{(i)}\).
  • if all \(x_i\) are identical, the Lorenz curve will be a straight diagonal line, also known as the identity line or line of equality.
  • If the \(x_i\) are of different sizes, then the Lorenz curve falls below the line of equality.

For grouped data, we simply describe the contributions for each class and approximate the values in each class by using its mid-point \(a_j\). We have:

\[\begin{align*} \tilde{u}_{i} &=\sum_{j=1}^{i} f_{j}, \quad i=1,2, \ldots, k ; \quad \text{where} \quad \tilde{u}_{0}:=0 \\ \tilde{v}_{i} &= \frac{\sum \limits_{j=1}^{i} f_{j} a_{j}}{\sum \limits_{j=1}^{k} f_{j} a_{j}} =\frac{\sum \limits_{j=1}^{i} n_{j} a_{j}}{n \bar{x}}, \quad i=1,2, \ldots, k ; \quad \text{where} \quad \tilde{v}_{0}:=0 . \end{align*} \]

2. Gini Coefficient

\[G = 2 \cdot F \]

where \(F\) is the area between the curve and the bisection line (i.e., diagonal line).

Property:

  • \(0 \leq G \leq 1\)
  • \(G=0\): no concentration
  • \(G=1\): perfect (i.e. extreme) concentration

Specially (Heumann et al, pp. 61):

\[G=1-\frac{1}{n} \sum_{i=1}^{n}\left(v_{i-1}+v_{i}\right) \]

It is obvious

\[0 \leq G \leq \frac{n-1}{n} \]

Thus, a standardized Gini coefficient may be prefer

\[G^{+} = \frac{n}{n-1}G \]

which takes a maximum value of 1.

Example of Computing Gini Coefficient

Suppose in a area, the income distribution of population is:

  • $1, 1 person
  • $2, 2 person
  • $3, 3 person
  • $4, 2 person
  • $5, 1 person

Thus, we have

\[\begin{array}{rl} \text{Income:} & x^1 = 1,2,3,4,5 \\ \text{Population:} & n^1 = 1,2,3,2,1, \qquad & n^2 = 1,1,1,1,1, \qquad & n^3 = 0,0,0,0,1 \qquad \end{array} \]

We can compute

\[\begin{array}{lll} G^1 = 0.178 & G^2 = 0.267 & G^3 = 0.800 \\ G^{+1} = 0.222 & G^{+2} = 0.333 & G^{+3} = 1.000 \end{array} \]

3. Lorenz asymmetry coefficient

The Lorenz asymmetry coefficient (LAC) is a summary statistic of the Lorenz curve that measures the degree of asymmetry of the curve.

\[S=F(\mu )+L(\mu) \]

where the functions \(F\) and \(L\) are defined as for the Lorenz curve, and \(\mu\) is the mean.

\[F(\mu )={\frac {m+\delta }{n}}, \qquad L(\mu )={\frac {L_{m}+\delta \, x_{(m+1)}}{L_{n}}} \]

where

\[\delta ={\frac {\mu -x_{(m)}}{x_{(m+1)}-x_{(m)}}}, \qquad L_{i}=\sum _{j=1}^{i}x_{(j)} \]

and \(m\) is the number of individuals with a size than \(\mu\) (i.e., \(x_{(m)} \leq \mu\)).

Property:

  • \(S > 1\), the point where the Lorenz curve is parallel with the line of equality is above the axis of symmetry.
  • \(S < 1\), then the point where the Lorenz curve is parallel to the line of equality is below the axis of symmetry.
  • \(S = 1\), the Lorenz curve is symmetric. e.g., the data arise from the log-normal distribution.

Note: if one or more of the data size is equal to \(\mu\), then \(S\) has to defined as an interval instead of a number

  • The above formulas assume that none of the data values are equal to \(\mu\); strictly speaking we assume that data sizes are continuously distributed, so that \({P(x_{i}=\mu )\approx 0}\).

  • Otherwise, if one or more of \(x_{i}=\mu\), then a section of the Lorenz curve is parallel to the diagonal, and \(S\) has to be defined as an interval instead of a number.

The interval can be defined as follows:

\[\left[{\frac {m}{n}}+{\frac {L_{m}}{L_{n}}}\ , \ {\frac {m+a}{n}}+{\frac {L_{m+a}}{n}}\right] \]

where \(a\) is the number of data values that are equal to \(\mu\).

4. Implementation

ineq package: Measuring Inequality, document

install.packages("ineq")

Gini Coefficient

ineq(x, parameter = NULL, type='gini', na.rm = TRUE)
Gini(x, corr = FALSE, na.rm = TRUE)
  • x : a vector containing at least non-negative elements
  • corr : g whether or not a finite sample correction should be applied
    • corr=FALSE : will compute according to formula of \(G\)
    • corr=TRUE : will compute according to formula of \(G^+\)
  • na.rm : whether remove missing values (NAs) before computations

Lorenz Curve

Lc(x, n = rep(1,length(x)), plot = FALSE)
  • n : frequencies of vector \(x\), must be same length as x.

Return

  • p : vector of percentages (x-axis)
  • L : vector with values of the ordinary Lorenz curve (y-axis)
  • L.general : vector with values of the generalized Lorenz curve

Lorenz Asymmetry Coefficient

Lasym(x, n = rep(1, length(x)), interval = FALSE, na.rm = TRUE)
  • interval : logical. In the case where there are observations exactly equal to the mean, either an interval of asymmetry coefficients can be returned or their midpoint.

Example

Self-coding of Gini Coefficient

gini.coef = function(x, freq=NA, norm=F){
  
  # the increasing index
  x.sort.ix = order(x)
  x.sort = sort(x)
  n = length(x)
  
  if (is.numeric(freq)) {
    freq.sort = freq[x.sort.ix] # the increasing index
    u = c(0, freq.sort)         # x-axis
    u = cumsum(u) / sum(u)
    v = c(0, x.sort*freq.sort)  # y-axis
    v = cumsum(v) / sum(v)
  }
  else {
    u = 0:n
    u = u/n                # x-axis
    v = c(0, x.sort)
    v = cumsum(v) / sum(v) # y-axis
  }
  
  gini = 0.
  for (i in 2:(n+1)){ gini = gini + v[i-1] + v[i] }
  gini = 1 - (1/n)* gini
  
  # standardized Gini coefficient
  if (norm == T){ gini = n / (n-1) * gini }
  
  res = list(gini=gini, v=v, u=u)
  return(res)
}

Example of computing Gini coefficient

x <- c(20, 14, 59, 9, 36, 23, 3)
gini.coef(x, norm=F)$gini
gini.coef(x, norm=T)$gini
Gini(x, corr=F)
Gini(x, corr=T)
# output
[1] 0.402439
[1] 0.4695122
[1] 0.402439
[1] 0.4695122

Remark: Haven't find the function in ineq package for computing Gini coefficient for the grouped data

Example of computing Lorenz curve

x <- c(20, 14, 59,  9, 36, 23,  3)
f <- c(10, 20, 35, 45, 10,  5, 25)
lc.res = Lc(x, n=f, plot=FALSE)
res = gini.coef(x, freq=f)
sum(abs(res$u - lc.res$p))
sum(abs(res$v - lc.res$L))
# output
[1] 0
[1] 0

Example of computing Lorenz Asymmetry Coefficient

Lasym(x, n=f, interval=FALSE)
# output
[1] 1.489699

x = c(5, 10, 15, 15, 15, 20, 25)
Lasym(x, interval=FALSE)
Lasym(x, interval=TRUE)
# output
[1] 0.8571429
[1] 0.4285714 1.2857143

Reference

Wikipedia, Lorenz curve, website

Wikipedia, Lorenz coefficient, website

Heumann, C., Schomaker, M., & Shalabh. (2016). Subsection 3.4 Measures of Concentration, In Introduction to Statistics and Data Analysis (pp. 60–62). Springer International Publishing.

Inequality Measurement, website

posted @ 2022-05-06 23:22  veager  阅读(145)  评论(0)    收藏  举报