高级统计方法 | Advanced statistical methods

来自选修的一门统计课程:Advanced statistical methods

理论性较弱,实践性很强的工具类课程,学完后可以直接拿R来分析数据。

 

课程目录:

  1. Introduction to R
  2. Regression model in R
  3. Applied regression I
  4. Applied regression II
  5. Applied regression III
  6. Conditional logistic regression and propensity score method
  7. Inverse probability weighting and meta analysis
  8. Instrumental variable analysis

 

课程结业标准:

  1. Appropriate analytic method
  2. Accurate numerical results
  3. Clear presentation of the results and choice of methods
  4. Interpretation of the results relevant to the public health context

 

1 Introduction to R

  1. Use R to perform basic algebraic operations
  2. Work with variables, vectors and matrices in R
  3. Produce clear and well formatted graphs in R
  4. Install and load R packages for specific needs

数据基本操作

基本运算符:+ - * / ^

基本运算函数:sqrt、exp、log、abs、round

帮助:?、??

数据基本操作函数:rep、seq、length、sum、mean、sd、median、min、max、var、sort、order、which、summary、sample、runif

矩阵运算:%*%、solve、t、colSums、colMeans、dim、cbind、rbind

逻辑运算:! & |

判断:is.na is.factor

类型转换:as.factor

数据转换:aggregate、plyr包、melt、table、prop.table

文件读取

文件存储

绘图

基本绘图

plot

pairs

hist

boxplot

points

lines

text

abline

polygon

legend

title

axis

par

windows

layout

pdf

dev.off

高级绘图

ggplot2

cowplot

 

2 Regression model in R

生成随机分布的数据:

runif - 均匀分布

rbinom

rnorm

sample

set.seed

cut

factor

relevel

线性回归

simple linear regression

multiple linear regression

interactions

lm

summary

Residuals - the difference between the actual observed response values

Coefficients

【必须了解summary结果里面的每一个指标及其意义】

CI

confint

QUICK GUIDE: INTERPRETING SIMPLE LINEAR MODEL OUTPUT IN R

 

3 Applied regression I

针对特定的数据使用合适的模型

  • Apply poisson and negative binomial regression models to count data
  • Identify and apply suitable model to overdispersed data

count data

  • Nonnegative
  • positively skewed
  • Variance tends to increase with mean
  • 不符合Homoscedasticity, Normality

Generalized Linear Model (GLM)

maximum likelihood

很奇怪,对1回归,summary(glm(deaths ~ 1, data=horse, family=poisson))?

Dispersion parameter for poisson family taken to be 1

glm的summary结果解读

Model checking

compare the observed event counts to data that we might have expected, under a Poisson(0.61) model

Formal model goodness-of-fit

residual deviance/df should not be too much bigger than 1

A Poisson model with covariates in R

summary(glm(deaths~corps, data=horse, family=poisson))

Incidence rate ratios (IRR) / relative risks

Poisson regression with offsets

Overdispersion - Negative Binomial model

the variance (823.475) is much larger than the mean (28.41)

summary(glm.nb(y~1, data=epilepsy))

Comparing models

A lower AIC indicates a ‘better’ model

 

4 Applied regression II

  1. Apply Poisson and negative binomial regression models to count data
  2. Identify and apply suitable model to overdispersed data
  3. Identify influential observations影响点,去掉某点后的影响力大小
  4. Perform model diagnostics
  5. Understand and deal with multicollinearity

hatvalues(mvc.r.lm)

sort(round(cooks.distance(mvc.r.lm),2), decreasing=T)

Model diagnostics

Estimation method and statistical tests are based on model assumptions

  • potential violated assumptions
  • extent of violation
  • Acknowledge limitation
  • alternative statistical model

Assumptions of linear regression model

  • Linearity
  • Homoscedasticity
  • Normality of the errors
  • Independence

Residual plot against fitted values

Q-Q Plot

P-P Plot

ACF plot

Multicollinearity

VIF

 

5 Applied regression III

  1. Identify and handle multicollinearity
  2. Account for confounding factors in regression model
  3. Assess potential effect modifiers in regression model
  4. Perform basic mediation analysis


6 Conditional logistic regression and propensity score method

  1. Fit conditional logistic regression model to data from case control study
  2. Understand the assumptions of the propensity score method
  3. Interpret results from propensity score method


7 Inverse probability weighting and meta analysis

  1. Appreciate the use of inverse probability weighting
  2. Apply inverse probability weighting for analysis of missing data
  3. Perform meta analysis to obtain overall estimate of an intervention effect from multiple studies


8 Instrumental variable analysis

  1. Estimate treatment effect using instrumental variable analysis for noncontrolled experiment
  2. Understand the assumptions instrumental variable analysis
  3. Interpret results from instrumental variable analysis

 

基本概念:

RR

OR和β(estimated coefficients)

 

 

Final exam

An investigator conducted a retrospective analysis on the association between statin therapy and psychological disorders, based on a database of medical records. The analysis adjusted for potential confounders such as age, sex, BMI and comorbidity.

研究人员根据病历数据库对他汀类(statin)药物治疗与心理疾病之间的关联进行了回顾性分析(retrospective analysis)。 该分析针对潜在的混杂因素(例如年龄,性别,BMI和合并症)进行了调整。

变量Variable name

  • Id
  • Male
  • Age
  • Bmi
  • comorbid.s, Charlson comorbidity index
  • Statin, Statin users
  • Psych, Psychological disorder
  id male age  bmi comorbid.s statin psych
1  1    0  54 20.9          1      0     0
2  2    0  42 19.1          0      0     0
3  3    1  46 23.9          1      1     0
4  4    1  58 23.5          0      0     1
5  5    1  43 28.7          1      1     0
6  6    1  46 26.6          0      1     0

-

问题:

(A) Carry out a standard regression analysis to estimate the effect of statin therapy on psychological disorder, adjusting for sex, age, BMI and comorbidity. Present the odds ratios with 95% confidence intervals for the variables in a
table. [10%] 标准的线性模型

The investigator also decided to carry out a propensity score analysis. PSA分析参考作业2
(B) Fit a propensity score model to predict statin use. You may consider main effects only (even when not all patient characteristics can be satisfactorily balanced). Present and interpret the model results. [8%]
(C) Based on your propensity score model, how well the patient characteristics were balanced across statin users and non-users with similar propensity scores? [6%]
(D) State the key assumptions of propensity score analysis and assess if they are satisfied. [6%]
(E) Do you think it is appropriate to use propensity score analysis in this setting? Briefly explain why. [4%]
(F) Estimate the effect of statin therapy (and the corresponding 95% CI) on psychological disorder and compare with the results in (A). [8%]
(G) Based on the results in (A) - (F), summarize and interpret the main findings from the analyses. [8%]

结题思路:

1. 可以用的模型,标准linear regression;GLM:possion、NB;clogit等

 

posted @ 2020-04-06 21:47  Life·Intelligence  阅读(1269)  评论(0编辑  收藏  举报
TOP