MATH2110 - STATISTICS 3
1 MATH2110The University of Nottingham SCHOOL OF MATHEMATICAL SCIENCESSPRING SEMESTER 2025
MATH2110 - STATISTICS 3
Coursework 2
Deadline: 3pm, Friday 2/5/2025
Your neat, clearly-legible solutions should be submitted electronically as a pdf file via the MATH2110 Moodlepage by the deadline indicated there. As this work is assessed, your submission must be entirely your ownwork (see the University’s policy on Academic Misconduct).Submissions up to five working days late will be subject to a penalty of 5% of the maximum mark per working
day.Deadline extensions due to Support Plans and Extenuating Circumstances can be requested according toSchool and University policies, as applicable to this module. Because of these policies, solutions (whereappropriate) and feedback cannot normally be released earlier than 10 working days after the main cohortsubmission deadline.The page limit is 8 pages and the minimum font size is 11.
THE DATA
As a medical statistician of the 19th century, your task is to assess associations between the fertility of differentSwiss regions and certain social parameters. The goal is to identify the most influential variables, select thebest model, and make predictions using it. You have data for 47 regions with the following variables:
- Fertility, standardised fertility measure.
- Agriculture, percentage of males involved in agriculture as occupation
- Examination, percentage draftees receiving highest mark on army examination
- Education, percentage education beyond primary school for draftees.
- Catholic, percentage of catholic.
- Infant.Mortality, normalised proportion of live births who live less than 1 year.You can load the data by running the 𝑅 command data(swiss). The only packages that may be used are
“BayesFactor” and “MASS”.MATH2110
Turn Over2 MATH2110THE TASKS
First divide the data into a training set (70% - 33 observations) and a test set (30% - 14 observations). All thefitting and selection should be done using exclusively the train set. To avoid having代写MATH2110 - STATISTICS 3 correlations during thetrain/test division, use the function sample() to randomly choose both groups.All modelling should be using Bayesian Normal linear models and use priors:𝛽|𝜎2 ∼ 𝑁 (0, 100Ip )𝜎 2 ∼ 𝐼𝐺(2, 2),where Ip is the 𝑝 × 𝑝 identity matrix and 𝐼𝐺denotes the inverse-gamma distribution.
- Consider the relationship between Examination and Fertility.
- Perform an exploratory analysis of the relationship between Examination and Fertility.
- Fit a Bayesian Normal linear model with Fertility as the dependent variable and Examination as theindependent variable.
- Write down the selected model posterior.
- Sample 10 sets of parameters from the posterior distribution and plot the resulting linear model foreach set of sampled parameters.[20 marks]
- Consider the relationship between Catholic and Fertility.
- Perform an exploratory analysis of the relationship between Catholic and Fertility.
- Create a new variable Catholic.Transform = (Catholic − 𝛼)2 for a suitable choice of 0 ≤ 𝛼 ≤ 100.
- Fit a Bayesian Normal linear model with Fertility as the dependent variable and Catholic.Transformas the independent variable.
- Write down the selected model posterior.
- Using the posterior mean for the parameters of the linear model consider the model fit.[25 marks]
- Use Bayes Factors to determine which of the models in 1 and 2 best fits the data. [5 marks]
- Consider general linear models for modelling Fertility as a function of the covariates.
- Perform model selection to choose a model and justify your choice of model.
- Write down the selected model posterior.
- Draw samples from the corresponding posterior.
- Present histograms (using function hist()) for the samples of each parameter.
- Compute estimates of the parameters and compare them.
- Make predictions for the Fertility values in the test set.
- Compare these with the real values.