Discrete Choice Models - Utility Functions and Binary Choice Models
Discrete Choice Models - Utility Functions and Binary Choice Models
This note is mainly transcribed from the lecture notes of "Modal Split Modeling: Discrete Choice Models" of CE5205 Transportation Planning
Discrete Choice Models Series:
-
Discrete Choice Models - Utility Functions and Binary Choice Models, site
-
Discrete Choice Models - Multinomial Choice Models, site
-
Discrete Choice Models - Nested and Mixed Logit Models, site
-
Discrete Choice Models - Implement in Python, site
-
Discrete Choice Models - Implement in R, site
1 Random Utility Functions
1.1 Utility in Economics
Definition: Utility is a measure of the satisfaction gained from the consumption of a "package" of goods/services (i.e., choice set). It is a measure of happiness or satisfaction.
Given this measure, one may speak meaningfully of increasing or decreasing utility, and thereby explain economic behavior in terms of attempts to increase one's utility.
1.2 Utility Function Defined in Economics
While preferences are the conventional foundation of microeconomics, it is convenient to represent preferences with a utility function and reason indirectly about preferences with utility functions.
Let \(X\) be the choice set, the set of all mutually-exclusive packages the consumer could conceivably consume.
The consumer's utility function \(U: X \rightarrow \mathbb{R}\) ranks each package in the choice set.
The consumer's choice is determined by the utility function. If \(U(x) ≥ U(y)\), then the consumer strictly prefers \(x\) to \(y\).
1.3 Utility function for a traveler \(n\) in choosing mode \(i\): \(U_{in}\)
Utility function is used to formulate the attractiveness of a travel model.
Utility function is derived from characteristics / features of a travel mode and those of the individual.
Assumption on travel mode choice of an individual traveler. The traveler \(n\) is assumed to select a travel mode from a set of travel modes (i.e., choice set) that produces the greatest utility.
Random utility function: The utility function of a travel mode is formulated as a random variable (from the perspective of analyst or modeler, because of uncertainty in modeling and the perception errors of travelers on utility).
2. Specification and Parameter Estimation of Random Utility Function
2.1 Random Utility Maximization Based Mode Choice Models
Based on the random utility, the probability of travel mode \(i\) being selected by traveler \(n\) from choice set \(C_n\) is given by:
where \(U_{in}\) is random utility of travel mode \(i \in C_n\) for traveler \(n\). Generally, \(U_{in}\) includes two parts:
-
one is deterministic component \(V_{in}\) and
-
the another is random term (or error term) \(\varepsilon_{in}\),
that is:
(1) Binary Choice Model
Choice set \(C_n\) contains exactly two travel modes, denoted by \(C_n = \{i , j \}\)
- Example: travel mode \(i\) might be the option of driving to work and travel mode \(j\) would be using transit (public transport)
Probability of person \(n\) choosing travel mode \(i\) is:
Probability of choosing alternative travel mode \(j\) is:
Two Propositions of Binary Choice Model:
- Proposition 1: Adding a constant to all the utilities does not affect the choice probabilities even though if it shifts functions \(V_{in}\) in \(V_{jn}\):
- Proposition 2: Relative nature of the utilities. Only the differences in utilities of travel modes matter.\[\begin{align*} \Pr{}_n(i) &= \Pr \left(U_{in} \geq U_{jn} \right) \\ &= \Pr \left(V_{in} + \varepsilon_{in} \geq V_{jn} + \varepsilon_{jn} \right) \\ &= \Pr \left(\varepsilon_{jn} - \varepsilon_{in} \leq V_{in} - V_{jn} \right) \end{align*} \]thus, only \(V_{in}-V_{jn}\) and \(\varepsilon_{jn} - \varepsilon_{in}\) matter.
(2) Multinomial Choice Models
Choice set \(C_n\) includes more than two travel modes.
Probability of traveler n choosing travel mode \(i\) is calculated by:
2.2 Determining Random Utility Function
(1) Three Basic Steps to Determine a Random Utility Function
Step 1: Separation of the utility function \(U_{in}\) into deterministic and random components.
Step 2: Function specification of the deterministic component \(V_{in}\).
Step 3: Distribution specification of the random term \(\varepsilon_{in}\).
(2) Deterministic and Random Components
where:
-
\(V_{in}\) and \(V_{jn}\) are called the systematic (or deterministic) components of the utilities of travel modes \(i\) and \(j\)
-
\(\varepsilon_{in}\) and \(\varepsilon_{jn}\) are the random components and are called the disturbances (or error terms).
2.3 Function Specification of Deterministic Component
-
The types of variables/attributes should be involved in these deterministic components/functions: \(V_{in}\) and \(V_{jn}\)
-
Generic variables \(\boldsymbol{z}_{in}\): for any individual \(n\), travel mode (i.e., alternative choice) \(i\) can be characterized by a vector of attributes
- \(\boldsymbol{z}_{in}\) includes mode specific variables, such as travel time, travel cost, comfortability, convenience and safety.
-
Alternative specific (social-economic) variables \(\boldsymbol{s}_n\): individual traveler \(n\): is characterized by another set of attributes, denoted by vector (i.e., social-economic attributes)
- \(s_n\) includes traveler related social-economic variables such as income, auto ownership, household size, age
-
Alternative-specific variables, such as, only for
Remark: If a given variable does not vary over alternatives (e.g., travel modes), i.e., alternative-specific socioeconomic variables, then we can include it in the utility function of at most \(J-1\) alternatives, where \(J\) is the total number of alternatives.
(1) Generic Function Expression
where:
and \(\mathbf{h}\) is a vector-valued function.
(2) Linear Utility Function
Suppose both utilities have the same vector of parameter for notational convenience
By appropriately defining the various elements in \(\boldsymbol{x}\), we can give the deterministic component of a utility function.
Example: (Ben-Akiva & Lerman 1985, Table 2, p.p.78)
In formulation:
-
Alternative-specific constant:
-
Generic variable:
-
Alternative-specific variable:
-
Alternative-specific socioeconomic variable:
2.5 Distribution Specification of Random Terms
For binary mode choice models, the distribution specification is done by considering only the difference \(\varepsilon_{jn} - \varepsilon_{in}\) rather than each term, \(\varepsilon_{jn}\) and \(\varepsilon_{in}\), separately.
In general, we will assume that all random terms have zero mean. When there are the nonzero means of the random terms, they will be "absorbed" into the deterministic component of the utility function, without affecting their corresponding choice probabilities.
If the distribution of random \(\varepsilon\) is not known, it is not possible to develop a binary mode choice model.
Basically, varying the assumptions about the distributions of and \(\varepsilon_{in}\) and \(\varepsilon_{jn}\) (or equivalently, assumptions about their difference) leads to different choice models.
3. Three Binary Choice Models
By making some assumptions on the distribution of two random terms and then solving for the probability that travel mode \(i\) is chosen, \(\Pr{}_n(i)\) and \(\Pr{}_n(j)=1-\Pr{}_n(i)\). The following binary choice models can be developed:
-
Binary Linear Probability Model
-
Binary Probit Model
-
Binary Logit Model
3.1 Binary Linear Probability Model
Assumptions:
- The difference in the random terms, \(\varepsilon_{jn} - \varepsilon_{in}\), is uniformly distributed between two fixed values \(-L\) and \(L\), with \(L>0\)
- Let \(\varepsilon_{n} = \varepsilon_{jn} - \varepsilon_{in}\) and its probability density function \(f(y)\)
Calculation of Probability \(\Pr_{n}(i)\):
Uniform Distribution:
Mode Choice Probabilities:
When \(V\) is linear in its variables, we have:
3.2 Binary Probit Model
Assumptions:
-
Suppose that in \(\varepsilon_{in}\) and \(\varepsilon_{jn}\) are both normally distributed (may not indenpendently) with zero means and variances \(\sigma_i^2\) and \(\sigma_j^2\), and further have a covariance \(\sigma_{ij}\).
\[\varepsilon_{in} \sim \mathcal{N}(0, \sigma_i), \qquad \varepsilon_{jn} \sim \mathcal{N}(0, \sigma_j) \] -
Under the above assumptions, the term \(\varepsilon_{jn} - \varepsilon_{in}\) is also normally distributed with mean zero but with variance \(\sigma^2 = \sigma_i^2 + \sigma_j^2 - 2 \sigma_{ij}\)
\[\varepsilon_{jn} - \varepsilon_{in} \sim \mathcal{N}(0, \sigma) \]
Calculation of probability \(\Pr{}_n(i)\)
where \(\Phi(\cdot)\) denotes the standardized cumulative normal distribution.
Case: when \(V\) is linear in its variables, we have
Thus, we have:
\(1/\sigma\) can be regarded as the scale of utility function
- Comments on the binary probit model:
Although the binary probit model is both intuitively reasonable and there is at least some theoretical ground for its assumption about the distribution of \(\varepsilon_{in}\) and \(\varepsilon_{jn}\), it has the unfortunate property of not having a closed form (i.e., explicit expression). We must express the choice probability as an integral (difficult to calibrate).
3.3 Binary Logit Model
Assumptions:
-
\(\varepsilon_n = \varepsilon_{jn} - \varepsilon_{in}\) follows a logistic distribution
-
the mean is zero, i.e., \(\mathrm{E}[\varepsilon_n] = 0\), says location parameter \(\eta=0\), and
-
the variance of \(\dfrac{\pi^2}{3 \mu^2}\), i.e., \(\mathrm{var}[\varepsilon_n] = \dfrac{\pi^2}{3 \mu^2}\).
-
Note: the assumption that \(\varepsilon_n = \varepsilon_{jn} - \varepsilon_{in}\) follows the logistic distribution is equivalent to assuming that in \(\varepsilon_{in}\) and \(\varepsilon_{jn}\) are independent and identically Gumbel distribution.
We have the following CDF and PDF about logistic distribution:
where \(\mu>0\) is a positive scale parameter.
Calculation of probability \(\Pr{}_n(i)\):
where \(\Phi(\cdot)\) denotes the standardized cumulative normal distribution.
Case: when \(V\) is linear in its variables, we have
Thus, we have:
where \(\mu\) is scale parameter, normally we set \(\mu=1.0\)
3.4 Extreme Cases of Linear, Probit and Logit Models
For the binary logit model,
-
When \(\mu \to \infty\) :
\[\Pr{}_n(i) = \begin{cases} 1, & \text{If } V_{in} - V_{jn} > 0 \\ 0, & \text{If } V_{in} - V_{jn} < 0 \end{cases} \] -
When \(\mu \to 0\) :
\[\Pr{}_n(i) = \Pr{}_n(j) = \frac{1}{2} \]
The deterministic limit exists for both the binary probit model \(\sigma \to 0\) and binary linear probability model \(L \to 0\) models.
The equal probability limits for theses models are equivalent to the conditions \(\sigma \to \infty\) and \(L \to \infty\), respectively.
3.4 Comparision

4. Maximum Likelihood Estimation
4.0 Lemmas
The variance-covariance matrix of an ML estimator \(\hat{\boldsymbol{\theta}}^{\text{ML}}\), is calculated by the inverse of the Fisher Information matrix \(\mathcal{I} (\boldsymbol{\theta})\):
where the Fisher Information matrix \(\mathcal{I} (\boldsymbol{\theta})\), is the negative of the expected value of the Hessian matrix of the log-likelihood function, i.e.,
4.1 Likelihood Function
Likelihood function for the binary choice model with the sample of \(N\) travelers
where
and \(\beta_{1}, \beta_{2}, \beta_{3}, \cdots, \beta_{K}\) are the parameters of the utility functions, for example:
The logarithm of Likelihood Functions, i.e., Log-Likelihood Functions
Note that \(y_{n,i} + y_{n,j} = 1\) and \(\Pr{}_n(i) + \Pr{}_n(j) = 1\). The negative of above formulation is also name binary cross-entropy loss function (see sklearn).
We can solve for the maximum of \(LL(\beta_1, \beta_2, \cdots, \beta_K)\) by differentiating it with respect to each of the \(\beta\)s and setting the partial derivatives equal to zero, i.e.,
If the optimal value of \(LL(\beta_1, \beta_2, \cdots, \beta_K)\) exists, it must satisfy the necessary conditions (i.e., first-order conditions) that. In many cases of practical interest we can show that the likelihood function is globally concave, so that if a solution to the first-order conditions exists, it is unique.
4.2 Solving coefficients \(\hat{\boldsymbol{\beta}}\)
Newton-Raphson algorithm to seek the optimal solutions of the maximum likelihood estimation.
Step 0: Initialization \(\hat{\boldsymbol{\beta}}^{(0)} = \left[ \beta_{1}^{(0)}, \beta_{1}^{(0)}, \cdots, \beta_{K}^{(0)} \right]^{\top}\), e.g., \(\hat{\boldsymbol{\beta}}^{(0)}=\boldsymbol{0}\)
Step 1: Linearize the function \(\nabla LL(\boldsymbol{\beta})\) around \(\hat{\boldsymbol{\beta}}^{(t)}\). The approximate first-order conditions
are given by:
Step 2: Solve and update
Step 3: Check the stop criterion. If the following conditions are satisfied,
then the iterations are stopped, otherwise, continue Step 1.
(1) Example: Solving Binary Logit Model
For the binary logit model, we have:
where we denote \(\boldsymbol{x}_n = \boldsymbol{x}_{in}-\boldsymbol{x}_{jn}\).
Thus, we have
and
Thus,
The second derivatives can be solved as:
Proof: the log-likelihood function is concave
Lemma: Let \(A\in\mathbb{R}^{n \times m}\) with \(n>m\), then matrix \(A^{\top}A\) is positive semidefinite, If \(\text{rank}(A)=m\) (i.e. A has full rank), then \(A^{\top}A\) is positive definite.
We have
is negative semidefinite, where the entry of \(\mathbf{A}\), \(a_{ni} = x_{ni} \big[ \Pr{}_n(i) ( 1 - \Pr{}_n(j) ) \big]^{1/2}\)
(2) Example: Solving Binary Probit Model
5. Hypothesis testing
5.1 Asymptotic t-Test
5.2 Confidence Region for Several Parameters Simultaneously
5.3 Likelihood Ratio Test
5.4 Goodness-of-Fit Measures
5.5 Test of Generic Attributes
5.6 Tests of Non-Nested Hypotheses
5.7 Tests of Nonlinear Specifications
5.8 Tests of Nonlinear Specifications
References
Ben-Akiva, M. E., & Lerman, S. R. (1985). Discrete choice analysis: Theory and application to travel demand. MIT Press.
Meng, Qiang, Lecture notes in Modal Split Modeling: Discrete Choice Models, CE5205 Transportation Planning, 2022
Train, K. (2009). Discrete choice methods with simulation (2nd ed). Cambridge University Press.
Ortúzar Salar, J. de D., & Willumsen, L. G. (2011). Modelling transport (4. ed). Wiley.

浙公网安备 33010602011771号