Discrete Choice Models - Utility Functions and Binary Choice Models

Discrete Choice Models - Utility Functions and Binary Choice Models

This note is mainly transcribed from the lecture notes of "Modal Split Modeling: Discrete Choice Models" of CE5205 Transportation Planning

Discrete Choice Models Series:

  • Discrete Choice Models - Utility Functions and Binary Choice Models, site

  • Discrete Choice Models - Multinomial Choice Models, site

  • Discrete Choice Models - Nested and Mixed Logit Models, site

  • Discrete Choice Models - Implement in Python, site

  • Discrete Choice Models - Implement in R, site

1 Random Utility Functions

1.1 Utility in Economics

Definition: Utility is a measure of the satisfaction gained from the consumption of a "package" of goods/services (i.e., choice set). It is a measure of happiness or satisfaction.

Given this measure, one may speak meaningfully of increasing or decreasing utility, and thereby explain economic behavior in terms of attempts to increase one's utility.

1.2 Utility Function Defined in Economics

While preferences are the conventional foundation of microeconomics, it is convenient to represent preferences with a utility function and reason indirectly about preferences with utility functions.

Let \(X\) be the choice set, the set of all mutually-exclusive packages the consumer could conceivably consume.

The consumer's utility function \(U: X \rightarrow \mathbb{R}\) ranks each package in the choice set.

The consumer's choice is determined by the utility function. If \(U(x) ≥ U(y)\), then the consumer strictly prefers \(x\) to \(y\).

1.3 Utility function for a traveler \(n\) in choosing mode \(i\): \(U_{in}\)

Utility function is used to formulate the attractiveness of a travel model.

Utility function is derived from characteristics / features of a travel mode and those of the individual.

Assumption on travel mode choice of an individual traveler. The traveler \(n\) is assumed to select a travel mode from a set of travel modes (i.e., choice set) that produces the greatest utility.

Random utility function: The utility function of a travel mode is formulated as a random variable (from the perspective of analyst or modeler, because of uncertainty in modeling and the perception errors of travelers on utility).

2. Specification and Parameter Estimation of Random Utility Function

2.1 Random Utility Maximization Based Mode Choice Models

Based on the random utility, the probability of travel mode \(i\) being selected by traveler \(n\) from choice set \(C_n\) is given by:

\[\Pr(i) = \Pr (U_{in} \geq U_{jn}, \forall j \in C_n) \]

where \(U_{in}\) is random utility of travel mode \(i \in C_n\) for traveler \(n\). Generally, \(U_{in}\) includes two parts:

  • one is deterministic component \(V_{in}\) and

  • the another is random term (or error term) \(\varepsilon_{in}\),

that is:

\[U_{in} = V_{in} + \varepsilon_{in} \]

(1) Binary Choice Model

Choice set \(C_n\) contains exactly two travel modes, denoted by \(C_n = \{i , j \}\)

  • Example: travel mode \(i\) might be the option of driving to work and travel mode \(j\) would be using transit (public transport)

Probability of person \(n\) choosing travel mode \(i\) is:

\[\Pr{}_n(i) = \Pr(U_{in} \geq U_{jn}) \]

Probability of choosing alternative travel mode \(j\) is:

\[\Pr{}_n(j) = \Pr(U_{jn} > U_{in}) = 1 - \Pr{}_n(i) \]

Two Propositions of Binary Choice Model:

  • Proposition 1: Adding a constant to all the utilities does not affect the choice probabilities even though if it shifts functions \(V_{in}\) in \(V_{jn}\):

\[\Pr{}_n(i) = \Pr \left(U_{in} \geq U_{jn} \right) = \Pr \left(\alpha + U_{in} \geq \alpha + U_{jn} \right) \]

  • Proposition 2: Relative nature of the utilities. Only the differences in utilities of travel modes matter.

    \[\begin{align*} \Pr{}_n(i) &= \Pr \left(U_{in} \geq U_{jn} \right) \\ &= \Pr \left(V_{in} + \varepsilon_{in} \geq V_{jn} + \varepsilon_{jn} \right) \\ &= \Pr \left(\varepsilon_{jn} - \varepsilon_{in} \leq V_{in} - V_{jn} \right) \end{align*} \]

    thus, only \(V_{in}-V_{jn}\) and \(\varepsilon_{jn} - \varepsilon_{in}\) matter.

(2) Multinomial Choice Models

Choice set \(C_n\) includes more than two travel modes.

Probability of traveler n choosing travel mode \(i\) is calculated by:

\[\Pr{}_{n}(i) = \Pr(U_{in} \geq U_{jn}, \forall j \neq i \, \text{ and } \, j \in C_n) \text{ and } \sum_{i \in C_{n}} \Pr {}_n ({i})=1 \]

2.2 Determining Random Utility Function

(1) Three Basic Steps to Determine a Random Utility Function

\[U_{in} = V_{in} + \varepsilon_{in} \]

Step 1: Separation of the utility function \(U_{in}\) into deterministic and random components.

Step 2: Function specification of the deterministic component \(V_{in}\).

Step 3: Distribution specification of the random term \(\varepsilon_{in}\).

(2) Deterministic and Random Components

\[\begin{align*} U_{in} &= V_{in} + \varepsilon_{in} \\ U_{jn} &= V_{jn} + \varepsilon_{jn} \end{align*} \]

where:

  • \(V_{in}\) and \(V_{jn}\) are called the systematic (or deterministic) components of the utilities of travel modes \(i\) and \(j\)

  • \(\varepsilon_{in}\) and \(\varepsilon_{jn}\) are the random components and are called the disturbances (or error terms).

2.3 Function Specification of Deterministic Component

  • The types of variables/attributes should be involved in these deterministic components/functions: \(V_{in}\) and \(V_{jn}\)

  • Generic variables \(\boldsymbol{z}_{in}\): for any individual \(n\), travel mode (i.e., alternative choice) \(i\) can be characterized by a vector of attributes

    • \(\boldsymbol{z}_{in}\) includes mode specific variables, such as travel time, travel cost, comfortability, convenience and safety.
  • Alternative specific (social-economic) variables \(\boldsymbol{s}_n\): individual traveler \(n\): is characterized by another set of attributes, denoted by vector (i.e., social-economic attributes)

    • \(s_n\) includes traveler related social-economic variables such as income, auto ownership, household size, age
  • Alternative-specific variables, such as, only for

Remark: If a given variable does not vary over alternatives (e.g., travel modes), i.e., alternative-specific socioeconomic variables, then we can include it in the utility function of at most \(J-1\) alternatives, where \(J\) is the total number of alternatives.

(1) Generic Function Expression

\[V_{in} = V(\boldsymbol{x}_{in}), \qquad V_{jn} = V(\boldsymbol{x}_{jn}) \]

where:

\[\boldsymbol{x}_{in} = \mathbf{h}(\boldsymbol{z}_{in}, \boldsymbol{s}_n), \qquad \boldsymbol{x}_{jn} = \mathbf{h}(\boldsymbol{z}_{jn}, \boldsymbol{s}_n) \]

and \(\mathbf{h}\) is a vector-valued function.

(2) Linear Utility Function

Suppose both utilities have the same vector of parameter for notational convenience

\[\begin{align*} &\text{For } i = A \, \text{(auto)}: & V_{in}&= \beta_{1} x_{in1} + \beta_{2} x_{in2} + \beta_{3} x_{in3} + \cdots + \beta_{K} x_{inK} \\ &&&= \boldsymbol{\beta}^{\top} \boldsymbol{x}_{in} = \boldsymbol{\beta}^{\top} \boldsymbol{x}_{An} \\ &\text{For } j = T \, \text{(transit)}: & V_{jn} &= \beta_{1} x_{jn1} + \beta_{2} x_{jn2} + \beta_{3} x_{jn3} + \cdots + \beta_{K} x_{jnK} \\ &&&= \boldsymbol{\beta}^{\top} \boldsymbol{x}_{jn} = \boldsymbol{\beta}^{\top} \boldsymbol{x}_{Tn} \end{align*} \]

By appropriately defining the various elements in \(\boldsymbol{x}\), we can give the deterministic component of a utility function.

Example: (Ben-Akiva & Lerman 1985, Table 2, p.p.78)

\[\begin{array}{c|cc|c} \hline \text{Coefficients} & \text{Auto utility} V_{An} & \text{Transit utility } V_{Bn} & \text{Variable type} \\ \hline \beta_1 & 1 & 0 & \text{Alternative-specific constant} \\ \beta_2 & \text{in-vehicle time} & \text{in-vehicle time} & \text{Generic} \\ \beta_3 & \text{out-vehicle time} & \text{out-vehicle time} & \text{Generic} \\ \beta_4 & \text{auto out-of-pocket cost} & 0 & \text{Alternative-specific}\\ \beta_5 & 0 & \text{transit fare} & \text{Alternative-specific} \\ \beta_6 & \text{household auto ownership} & 0 & \text{Alternative-specific socioeconomic} \\ \hline \end{array} \]

In formulation:

\[\begin{array}{lllll} V_{An} & = \beta_1 + \beta_2 x_{An,2} + \beta_3 x_{An,3} + \beta_4 x_{An,4} + {\qquad \quad \ } + \beta_6 x_{Bn,6} \\ V_{Bn} & = {\qquad \ \,} \beta_2 x_{Bn,2} + \beta_3 x_{Bn,3} {\qquad \qquad \ \ } + \beta_5 x_{Bn,5} \end{array} \]

  • Alternative-specific constant:

  • Generic variable:

  • Alternative-specific variable:

  • Alternative-specific socioeconomic variable:

2.5 Distribution Specification of Random Terms

For binary mode choice models, the distribution specification is done by considering only the difference \(\varepsilon_{jn} - \varepsilon_{in}\) rather than each term, \(\varepsilon_{jn}\) and \(\varepsilon_{in}\), separately.

\[\begin{align*} U_{in} &= V_{in} + \varepsilon_{in} \\ U_{jn} &= V_{jn} + \varepsilon_{jn} \end{align*} \]

In general, we will assume that all random terms have zero mean. When there are the nonzero means of the random terms, they will be "absorbed" into the deterministic component of the utility function, without affecting their corresponding choice probabilities.

If the distribution of random \(\varepsilon\) is not known, it is not possible to develop a binary mode choice model.

Basically, varying the assumptions about the distributions of and \(\varepsilon_{in}\) and \(\varepsilon_{jn}\) (or equivalently, assumptions about their difference) leads to different choice models.

3. Three Binary Choice Models

By making some assumptions on the distribution of two random terms and then solving for the probability that travel mode \(i\) is chosen, \(\Pr{}_n(i)\) and \(\Pr{}_n(j)=1-\Pr{}_n(i)\). The following binary choice models can be developed:

  • Binary Linear Probability Model

  • Binary Probit Model

  • Binary Logit Model

3.1 Binary Linear Probability Model

Assumptions:

  • The difference in the random terms, \(\varepsilon_{jn} - \varepsilon_{in}\), is uniformly distributed between two fixed values \(-L\) and \(L\), with \(L>0\)
  • Let \(\varepsilon_{n} = \varepsilon_{jn} - \varepsilon_{in}\) and its probability density function \(f(y)\)

Calculation of Probability \(\Pr_{n}(i)\):

\[\begin{align*} \Pr{}_{n}(i) &= \Pr(U_{in} \geq U_{jn}) = \Pr(V_{in} + \varepsilon_{in} \geq V_{jn} + \varepsilon_{jn}) \\ &= \Pr(\varepsilon_{jn} - \varepsilon_{in} \leq V_{in} - V_{jn}) \\ &= \Pr(\varepsilon_{n} \leq V_{in} - V_{jn}) \\ &= \int^{V_{in}-V_{jn}}_{-\infty} f(y) \, \mathrm{d} y \end{align*} \]

Uniform Distribution:

\[f(y) = \begin{cases} 1/(2L), & -L \leq y \leq L \\ 0, & \text{otherwise} \end{cases} \]

Mode Choice Probabilities:

\[\Pr{}_n(i) = \begin{cases} 0, & \text{If } V_{in} - V_{jn} < -L \\ \displaystyle \int_{-L}^{V_{in}-V_{jn}}f(y) \, \mathrm{d} y = \frac{V_{in}-V_{in}+L}{2L}, & \text{If } -L \leq V_{in} - V_{jn} \leq L \\ 1, & \text{If } V_{in} - V_{jn} > L \end{cases} \]

When \(V\) is linear in its variables, we have:

\[V_{in} - V_{jn} = \boldsymbol{\beta}^{\top} \boldsymbol{x}_{in} - \boldsymbol{\beta}^{\top} \boldsymbol{x}_{jn} = \boldsymbol{\beta}^{\top} (\boldsymbol{x}_{in} - \boldsymbol{x}_{jn}) \]

3.2 Binary Probit Model

Assumptions:

  • Suppose that in \(\varepsilon_{in}\) and \(\varepsilon_{jn}\) are both normally distributed (may not indenpendently) with zero means and variances \(\sigma_i^2\) and \(\sigma_j^2\), and further have a covariance \(\sigma_{ij}\).

    \[\varepsilon_{in} \sim \mathcal{N}(0, \sigma_i), \qquad \varepsilon_{jn} \sim \mathcal{N}(0, \sigma_j) \]

  • Under the above assumptions, the term \(\varepsilon_{jn} - \varepsilon_{in}\) is also normally distributed with mean zero but with variance \(\sigma^2 = \sigma_i^2 + \sigma_j^2 - 2 \sigma_{ij}\)

    \[\varepsilon_{jn} - \varepsilon_{in} \sim \mathcal{N}(0, \sigma) \]

Calculation of probability \(\Pr{}_n(i)\)

\[\begin{align*} \Pr{}_{n}(i) &= \Pr(V_{in} + \varepsilon_{in} \geq V_{jn} + \varepsilon_{jn}) = \Pr(\varepsilon_{jn} - \varepsilon_{in} \leq V_{in} - V_{jn}) \\ \\ &= \int_{y=-\infty}^{V_{in}-V_{jn}} \frac{1}{\sqrt{2 \pi} \sigma} \exp \left[ -\frac{1}{2} \left(\frac{y}{\sigma} \right)^2 \right] \, \mathrm{d} y, \sigma > 0 \\ \\ &= \frac{1}{\sqrt{2 \pi}} \int_{u=-\infty}^{\frac{V_{in}-V_{jn}}{\sigma}} \exp \left( -\frac{1}{2} u^2 \right) \, \mathrm{d} u = \Phi \left(\frac{V_{in}-V_{jn}}{\sigma} \right) \end{align*} \]

where \(\Phi(\cdot)\) denotes the standardized cumulative normal distribution.

Case: when \(V\) is linear in its variables, we have

\[V_{in} - V_{jn} = \boldsymbol{\beta}^{\top} \boldsymbol{x}_{in} - \boldsymbol{\beta}^{\top} \boldsymbol{x}_{jn} = \boldsymbol{\beta}^{\top} (\boldsymbol{x}_{in} - \boldsymbol{x}_{jn}) \]

Thus, we have:

\[\begin{align*} \Pr{}_n(i) &= \Phi \left[\frac{\boldsymbol{\beta}^{\top} \left( \boldsymbol{x}_{in} - \boldsymbol{x}_{jn} \right)}{\sigma} \right] \\ \Pr{}_n(j) &= 1 - \Pr{}_n(i) \end{align*} \]

\(1/\sigma\) can be regarded as the scale of utility function

  • Comments on the binary probit model:

Although the binary probit model is both intuitively reasonable and there is at least some theoretical ground for its assumption about the distribution of \(\varepsilon_{in}\) and \(\varepsilon_{jn}\), it has the unfortunate property of not having a closed form (i.e., explicit expression). We must express the choice probability as an integral (difficult to calibrate).

3.3 Binary Logit Model

Assumptions:

  • \(\varepsilon_n = \varepsilon_{jn} - \varepsilon_{in}\) follows a logistic distribution

  • the mean is zero, i.e., \(\mathrm{E}[\varepsilon_n] = 0\), says location parameter \(\eta=0\), and

  • the variance of \(\dfrac{\pi^2}{3 \mu^2}\), i.e., \(\mathrm{var}[\varepsilon_n] = \dfrac{\pi^2}{3 \mu^2}\).

  • Note: the assumption that \(\varepsilon_n = \varepsilon_{jn} - \varepsilon_{in}\) follows the logistic distribution is equivalent to assuming that in \(\varepsilon_{in}\) and \(\varepsilon_{jn}\) are independent and identically Gumbel distribution.

We have the following CDF and PDF about logistic distribution:

\[\begin{align*} & \text{CDF:} \quad F(y) = \frac{1}{1+\exp(-\mu y)}\\ & \text{PDF:} \quad f(y) = \frac{\mu \exp(-\mu y)}{[1+\exp(-\mu y)]^2} \end{align*} \]

where \(\mu>0\) is a positive scale parameter.

Calculation of probability \(\Pr{}_n(i)\):

\[\begin{align*} \Pr{}_{n}(i) &= \Pr(V_{in} + \varepsilon_{in} \geq V_{jn} + \varepsilon_{jn}) = \Pr(\varepsilon_{jn} - \varepsilon_{in} \leq V_{in} - V_{jn}) = \Pr(\varepsilon_{n} \leq V_{in} - V_{jn}) \\ &= \frac{1}{1+\exp[-\mu(V_{in}-V_{jn})]} = \frac{\exp(\mu V_{in})}{\exp(\mu V_{in}) + \exp(\mu V_{jn})} \end{align*} \]

where \(\Phi(\cdot)\) denotes the standardized cumulative normal distribution.

Case: when \(V\) is linear in its variables, we have

\[V_{in} - V_{jn} = \boldsymbol{\beta}^{\top} \boldsymbol{x}_{in} - \boldsymbol{\beta}^{\top} \boldsymbol{x}_{jn} = \boldsymbol{\beta}^{\top} (\boldsymbol{x}_{in} - \boldsymbol{x}_{jn}) \]

Thus, we have:

\[\begin{align*} \Pr{}_n(i) &= \frac{1}{1+\exp[-\mu \boldsymbol{\beta}^{\top}(\boldsymbol{x}_{in}-\boldsymbol{x}_{jn})]} \\ \Pr{}_n(j) &= 1 - \Pr{}_n(i) \end{align*} \]

where \(\mu\) is scale parameter, normally we set \(\mu=1.0\)

3.4 Extreme Cases of Linear, Probit and Logit Models

For the binary logit model,

  • When \(\mu \to \infty\) :

    \[\Pr{}_n(i) = \begin{cases} 1, & \text{If } V_{in} - V_{jn} > 0 \\ 0, & \text{If } V_{in} - V_{jn} < 0 \end{cases} \]

  • When \(\mu \to 0\) :

    \[\Pr{}_n(i) = \Pr{}_n(j) = \frac{1}{2} \]

The deterministic limit exists for both the binary probit model \(\sigma \to 0\) and binary linear probability model \(L \to 0\) models.

The equal probability limits for theses models are equivalent to the conditions \(\sigma \to \infty\) and \(L \to \infty\), respectively.

3.4 Comparision

4. Maximum Likelihood Estimation

4.0 Lemmas

The variance-covariance matrix of an ML estimator \(\hat{\boldsymbol{\theta}}^{\text{ML}}\), is calculated by the inverse of the Fisher Information matrix \(\mathcal{I} (\boldsymbol{\theta})\):

\[\text{VAR} \big[ \hat{\boldsymbol{\theta}}^{\text{ML}} \, \big] = \big( \mathcal{I} (\theta) \big)^{-1} = \big( -\mathbb{E} \left[ \nabla^2 LL(\boldsymbol{\theta}) \right] \big)^{-1} \]

where the Fisher Information matrix \(\mathcal{I} (\boldsymbol{\theta})\), is the negative of the expected value of the Hessian matrix of the log-likelihood function, i.e.,

\[H(\boldsymbol{\theta}) = \nabla^2 LL(\boldsymbol{\theta}), \quad \text{where} \quad h_{ij} = \frac{\partial^2 LL(\boldsymbol{\theta})}{\partial \theta_i \partial \theta_j} \]

4.1 Likelihood Function

Likelihood function for the binary choice model with the sample of \(N\) travelers

\[L(\beta_1, \beta_2, \cdots, \beta_K) = \prod_{n=1}^{N} \Pr{}_n(i)^{y_{n,i}} \, \cdot \, \Pr{}_n(j)^{y_{n,j}} \]

where

\[y_{n,i} = \begin{cases} 1, & \text{If traveler $n$ selects mode $i$} \\ 0, & \text{Otherwise} \end{cases}, \qquad y_{n,j} = \begin{cases} 1, & \text{If traveler $n$ selects mode $j$} \\ 0, & \text{Otherwise} \end{cases} \]

and \(\beta_{1}, \beta_{2}, \beta_{3}, \cdots, \beta_{K}\) are the parameters of the utility functions, for example:

\[\begin{align*} V_{in} &= \beta_{1} x_{in1} + \beta_{2} x_{in2} + \beta_{3} x_{in3} + \cdots + \beta_{K} x_{inK} = \boldsymbol{\beta}^{\top} \boldsymbol{x}_{in}, \quad i \in C_{n} \\ V_{jn} &= \beta_{1} x_{jn1} + \beta_{2} x_{jn2} + \beta_{3} x_{jn3} + \cdots + \beta_{K} x_{jnK} = \boldsymbol{\beta}^{\top} \boldsymbol{x}_{jn}, \quad j \in C_{n} \end{align*} \]

The logarithm of Likelihood Functions, i.e., Log-Likelihood Functions

\[\begin{aligned} LL(\beta_1, \beta_2, \cdots, \beta_K) & \triangleq \ln [L(\beta_1, \beta_2, \cdots, \beta_K)] \\ &= \sum_{n=1}^{N} \left[ y_{n,i} \, \ln \Pr{}_n(i) + y_{n,j} \, \ln \Pr{}_n(j) \right] \\ &= \sum_{n=1}^{N} \Big\{ y_{n,i} \, \ln \Pr{}_n(i) + (1-y_{n,i}) \, \ln \left[ 1 - \Pr{}_n(i) \right] \Big\} \end{aligned} \]

Note that \(y_{n,i} + y_{n,j} = 1\) and \(\Pr{}_n(i) + \Pr{}_n(j) = 1\). The negative of above formulation is also name binary cross-entropy loss function (see sklearn).

We can solve for the maximum of \(LL(\beta_1, \beta_2, \cdots, \beta_K)\) by differentiating it with respect to each of the \(\beta\)s and setting the partial derivatives equal to zero, i.e.,

\[\frac{\partial LL}{\partial \beta_k} = \sum_{n=1}^{N} \left\{ {y_{n,i} \frac{\partial \Pr{}_n(i) / \partial \beta_k}{\Pr{}_n(i)}} + {y_{n,j} \frac{\partial \Pr{}_n(j) / \partial \beta_k}{\Pr{}_n(j)}} \right \} = 0 \]

If the optimal value of \(LL(\beta_1, \beta_2, \cdots, \beta_K)\) exists, it must satisfy the necessary conditions (i.e., first-order conditions) that. In many cases of practical interest we can show that the likelihood function is globally concave, so that if a solution to the first-order conditions exists, it is unique.

4.2 Solving coefficients \(\hat{\boldsymbol{\beta}}\)

Newton-Raphson algorithm to seek the optimal solutions of the maximum likelihood estimation.

Step 0: Initialization \(\hat{\boldsymbol{\beta}}^{(0)} = \left[ \beta_{1}^{(0)}, \beta_{1}^{(0)}, \cdots, \beta_{K}^{(0)} \right]^{\top}\), e.g., \(\hat{\boldsymbol{\beta}}^{(0)}=\boldsymbol{0}\)

Step 1: Linearize the function \(\nabla LL(\boldsymbol{\beta})\) around \(\hat{\boldsymbol{\beta}}^{(t)}\). The approximate first-order conditions
are given by:

\[\nabla LL (\hat{\boldsymbol{\beta}}^{(t)}) + \nabla^2 LL (\hat{\boldsymbol{\beta}}^{(t)}) (\hat{\boldsymbol{\beta}} - \hat{\boldsymbol{\beta}}^{(t)}) = 0 \]

Step 2: Solve and update

\[\hat{\boldsymbol{\beta}}^{(t+1)} = \hat{\boldsymbol{\beta}}^{(t)} - \left[ \nabla^2 LL (\hat{\boldsymbol{\beta}}^{(t)}) \right]^{-1} \, \nabla LL (\hat{\boldsymbol{\beta}}^{(t)}) \]

Step 3: Check the stop criterion. If the following conditions are satisfied,

\[\left\|\hat{\boldsymbol{\beta}}^{(t+1)} - \hat{\boldsymbol{\beta}}^{(t)} \right\|_2 < \delta \qquad \text{or} \qquad \left| \frac{\hat{\beta}^{(t+1)}_k - \hat{\beta}^{(t)}}{\hat{\beta}^{(t)}} \right| < \delta, \forall k \]

then the iterations are stopped, otherwise, continue Step 1.

(1) Example: Solving Binary Logit Model

For the binary logit model, we have:

\[\begin{aligned} \Pr{}_n(i) &= \frac{1}{1+\exp \left( - \boldsymbol{\beta}^{\top}\boldsymbol{x}_n \right)} \\ \Pr{}_n(j) &= 1 - \Pr{}_n(i) = \frac{ \exp \left(-\boldsymbol{\beta}^{\top} \boldsymbol{x}_n \right)}{1+\exp \left(- \boldsymbol{\beta}^{\top}\boldsymbol{x}_n \right) } \end{aligned} \]

where we denote \(\boldsymbol{x}_n = \boldsymbol{x}_{in}-\boldsymbol{x}_{jn}\).

Thus, we have

\[\begin{aligned} \frac{\partial \Pr{}_n(i) }{ \partial \beta_k } &= \frac{\partial \Pr{}_n(i) }{ \partial \left[ \exp \left( - \boldsymbol{\beta}^{\top} \boldsymbol{x}_n \right) \right] } \cdot \frac{ \partial \left[ \exp \left( - \boldsymbol{\beta}^{\top} \boldsymbol{x}_n \right) \right] }{ \partial \left( - \boldsymbol{\beta}^{\top} \boldsymbol{x}_n \right) } \cdot \frac{ \partial \left( - \boldsymbol{\beta}^{\top} \boldsymbol{x}_n \right)}{ \partial \beta_k } \\ & =- \frac{1}{\left[1 + \exp \left( -\boldsymbol{\beta}^{\top} \boldsymbol{x}_n \right) \right]^2} \cdot \exp \left( - \boldsymbol{\beta}^{\top} \boldsymbol{x}_n \right) \cdot (- x_{nk}) \\ &= \Pr{}_n(i) \cdot \Pr{}_n(j) \cdot x_{nk} \end{aligned} \]

and

\[\begin{aligned} \frac{\partial \Pr{}_n(j) }{ \partial \beta_k } &= \frac{\partial \Pr{}_n(j) }{ \partial \left[ \exp \left( - \boldsymbol{\beta}^{\top} \boldsymbol{x}_n \right) \right] } \cdot \frac{ \partial \left[ \exp \left( - \boldsymbol{\beta}^{\top} \boldsymbol{x}_n \right) \right] }{ \partial \left( - \boldsymbol{\beta}^{\top} \boldsymbol{x}_n \right) } \cdot \frac{ \partial (- \boldsymbol{\beta}^{\top} \boldsymbol{x}_n )}{ \beta_k } \\ &= \frac{1}{\left[1 + \exp \left( -\boldsymbol{\beta}^{\top} \boldsymbol{x}_n \right) \right]^2} \cdot \exp \left( - \boldsymbol{\beta}^{\top} \boldsymbol{x}_n \right) \cdot (- x_{nk}) \\ &= - \Pr{}_n(i) \cdot \Pr{}_n(j) \cdot x_{nk} \end{aligned} \]

Thus,

\[\begin{aligned} \frac{\partial LL}{\partial \beta_k} &= \sum_{n=1}^{N} \Big[ y_{n,i} \cdot \Pr{}_n(j) \cdot x_{nk} - y_{n,j} \cdot \Pr{}_n(i) \cdot ( - x_{nk}) \Big] \\ &= \sum_{n=1}^{N} \Big[ y_{n,i} \cdot ( 1 - \Pr{}_n(i) ) \cdot x_{nk} - (1 - y_{n,i}) \cdot \Pr{}_n(i) \cdot x_{nk} \Big] \\ &= \sum_{n=1}^{N} \Big\{ \big[ y_{n,i} - \Pr{}_n(i) \big] \, x_{nk} \Big\} \end{aligned} \]

The second derivatives can be solved as:

\[\begin{aligned} \frac{\partial^2 LL}{\partial \beta_k \, \partial \beta_l} &= \frac{\partial}{\partial \beta_l} \left( \frac{\partial LL}{\partial \beta_k} \right) \\ &= \sum_{n=1}^{N} \left[ - x_{nk} \cdot \frac{\partial \Pr{}_n(i)}{\partial \beta_l} \right] \\ &= - \sum_{n=1}^{N} \Big\{ \Pr{}_n(i) \big[ 1 - \Pr{}_n(j) \big] \cdot x_{nk} \cdot x_{nl} \Big\} \end{aligned} \]

Proof: the log-likelihood function is concave

Lemma: Let \(A\in\mathbb{R}^{n \times m}\) with \(n>m\), then matrix \(A^{\top}A\) is positive semidefinite, If \(\text{rank}(A)=m\) (i.e. A has full rank), then \(A^{\top}A\) is positive definite.

We have

\[\nabla^2 LL(\boldsymbol{\beta}) = - \mathbf{A}^{\top} \mathbf{A} \preceq \mathbf{0} \]

is negative semidefinite, where the entry of \(\mathbf{A}\), \(a_{ni} = x_{ni} \big[ \Pr{}_n(i) ( 1 - \Pr{}_n(j) ) \big]^{1/2}\)

(2) Example: Solving Binary Probit Model

5. Hypothesis testing

5.1 Asymptotic t-Test

5.2 Confidence Region for Several Parameters Simultaneously

5.3 Likelihood Ratio Test

5.4 Goodness-of-Fit Measures

5.5 Test of Generic Attributes

5.6 Tests of Non-Nested Hypotheses

5.7 Tests of Nonlinear Specifications

5.8 Tests of Nonlinear Specifications

References

Ben-Akiva, M. E., & Lerman, S. R. (1985). Discrete choice analysis: Theory and application to travel demand. MIT Press.

Meng, Qiang, Lecture notes in Modal Split Modeling: Discrete Choice Models, CE5205 Transportation Planning, 2022

Train, K. (2009). Discrete choice methods with simulation (2nd ed). Cambridge University Press.

Ortúzar Salar, J. de D., & Willumsen, L. G. (2011). Modelling transport (4. ed). Wiley.

posted @ 2022-03-11 10:19  veager  阅读(95)  评论(0)    收藏  举报