高级统计方法 | Advanced statistical methods

来自选修的一门统计课程：Advanced statistical methods

理论性较弱，实践性很强的工具类课程，学完后可以直接拿R来分析数据。

课程目录：

Introduction to R
Regression model in R
Applied regression I
Applied regression II
Applied regression III
Conditional logistic regression and propensity score method
Inverse probability weighting and meta analysis
Instrumental variable analysis

课程结业标准：

Appropriate analytic method
Accurate numerical results
Clear presentation of the results and choice of methods
Interpretation of the results relevant to the public health context

1 Introduction to R

Use R to perform basic algebraic operations
Work with variables, vectors and matrices in R
Produce clear and well formatted graphs in R
Install and load R packages for specific needs

数据基本操作

基本运算符：+ - * / ^

基本运算函数：sqrt、exp、log、abs、round

帮助：？、？？

数据基本操作函数：rep、seq、length、sum、mean、sd、median、min、max、var、sort、order、which、summary、sample、runif

矩阵运算：%*%、solve、t、colSums、colMeans、dim、cbind、rbind

逻辑运算：! & |

判断：is.na is.factor

类型转换：as.factor

数据转换：aggregate、plyr包、melt、table、prop.table

文件读取

文件存储

绘图

基本绘图

plot

pairs

hist

boxplot

points

lines

text

abline

polygon

legend

title

axis

par

windows

layout

pdf

dev.off

高级绘图

ggplot2

cowplot

2 Regression model in R

生成随机分布的数据：

runif - 均匀分布

rbinom

rnorm

sample

set.seed

cut

factor

relevel

线性回归

simple linear regression

multiple linear regression

interactions

summary

Residuals - the difference between the actual observed response values

Coefficients

【必须了解summary结果里面的每一个指标及其意义】

confint

QUICK GUIDE: INTERPRETING SIMPLE LINEAR MODEL OUTPUT IN R

3 Applied regression I

针对特定的数据使用合适的模型

Apply poisson and negative binomial regression models to count data
Identify and apply suitable model to overdispersed data

count data

Nonnegative
positively skewed
Variance tends to increase with mean
不符合Homoscedasticity, Normality

Generalized Linear Model (GLM)

maximum likelihood

很奇怪，对1回归，summary(glm(deaths ~ 1, data=horse, family=poisson))？

Dispersion parameter for poisson family taken to be 1

glm的summary结果解读

Model checking

compare the observed event counts to data that we might have expected, under a Poisson(0.61) model

Formal model goodness-of-fit

residual deviance/df should not be too much bigger than 1

A Poisson model with covariates in R

summary(glm(deaths~corps, data=horse, family=poisson))

Incidence rate ratios (IRR) / relative risks

Poisson regression with offsets

Overdispersion - Negative Binomial model

the variance (823.475) is much larger than the mean (28.41)

summary(glm.nb(y~1, data=epilepsy))

Comparing models

A lower AIC indicates a ‘better’ model

4 Applied regression II

Apply Poisson and negative binomial regression models to count data
Identify and apply suitable model to overdispersed data
Identify influential observations影响点，去掉某点后的影响力大小
Perform model diagnostics
Understand and deal with multicollinearity

hatvalues(mvc.r.lm)

sort(round(cooks.distance(mvc.r.lm),2), decreasing=T)

Model diagnostics

Estimation method and statistical tests are based on model assumptions

potential violated assumptions
extent of violation
Acknowledge limitation
alternative statistical model

Assumptions of linear regression model

Linearity
Homoscedasticity
Normality of the errors
Independence

Residual plot against fitted values

Q-Q Plot

P-P Plot

ACF plot

Multicollinearity

VIF

5 Applied regression III

Identify and handle multicollinearity
Account for confounding factors in regression model
Assess potential effect modifiers in regression model
Perform basic mediation analysis

6 Conditional logistic regression and propensity score method

Fit conditional logistic regression model to data from case control study
Understand the assumptions of the propensity score method
Interpret results from propensity score method

7 Inverse probability weighting and meta analysis

Appreciate the use of inverse probability weighting
Apply inverse probability weighting for analysis of missing data
Perform meta analysis to obtain overall estimate of an intervention effect from multiple studies

8 Instrumental variable analysis

Estimate treatment effect using instrumental variable analysis for noncontrolled experiment
Understand the assumptions instrumental variable analysis
Interpret results from instrumental variable analysis

基本概念：

OR和β（estimated coefficients）

Final exam

An investigator conducted a retrospective analysis on the association between statin therapy and psychological disorders, based on a database of medical records. The analysis adjusted for potential confounders such as age, sex, BMI and comorbidity.

研究人员根据病历数据库对他汀类（statin）药物治疗与心理疾病之间的关联进行了回顾性分析（retrospective analysis）。该分析针对潜在的混杂因素（例如年龄，性别，BMI和合并症）进行了调整。

变量Variable name

Id
Male
Age
Bmi
comorbid.s, Charlson comorbidity index
Statin, Statin users
Psych, Psychological disorder

  id male age  bmi comorbid.s statin psych
1  1    0  54 20.9          1      0     0
2  2    0  42 19.1          0      0     0
3  3    1  46 23.9          1      1     0
4  4    1  58 23.5          0      0     1
5  5    1  43 28.7          1      1     0
6  6    1  46 26.6          0      1     0

问题：

(A) Carry out a standard regression analysis to estimate the effect of statin therapy on psychological disorder, adjusting for sex, age, BMI and comorbidity. Present the odds ratios with 95% confidence intervals for the variables in a
table. [10%] 标准的线性模型

The investigator also decided to carry out a propensity score analysis. PSA分析参考作业2
(B) Fit a propensity score model to predict statin use. You may consider main effects only (even when not all patient characteristics can be satisfactorily balanced). Present and interpret the model results. [8%]
(C) Based on your propensity score model, how well the patient characteristics were balanced across statin users and non-users with similar propensity scores? [6%]
(D) State the key assumptions of propensity score analysis and assess if they are satisfied. [6%]
(E) Do you think it is appropriate to use propensity score analysis in this setting? Briefly explain why. [4%]
(F) Estimate the effect of statin therapy (and the corresponding 95% CI) on psychological disorder and compare with the results in (A). [8%]
(G) Based on the results in (A) - (F), summarize and interpret the main findings from the analyses. [8%]

结题思路：

1. 可以用的模型，标准linear regression；GLM：possion、NB；clogit等

posted @ 2020-04-06 21:47 Life·Intelligence 阅读(1269) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Digital-LI