R function notes
刚刚沉迷md时候的第一篇了算是,有点怀念啊。
Tidyverse
dplyr
glimpse(data)
查看数据变量类型及前几个值
summarize(data, Variable = function(data, na.rm = TRUE))
总结数据,可用向量到单值的函数
gather(data, key = Key, value = Value)
使原变量名成为新变量Key的一列值,原变量的观测值成为新变量Value的一列值
spread(data, key = Key, value = Value)
与gather作用相反
filter(data, conditions of variables)
选出符合条件的观测值行, 多个条件逗号隔开
group_by(data, categorical variables)
观测值按分类变量成组,分类变量逗号隔开
ungroup(data)
干死上面的那个函数
mutate(data, Variable = blabla)
给数据添加新变量列
rename(data, Variable = variable)
给变量重命名
arrange(data, variable)
按变量升序排列观测值
arrange(data, desc(variable))
按变量降序排列观测值
inner_join(data1, data2, by = c("variable1" = "variable2"))
按变量合并数据,合成后只剩共有的,可以按多个变量合并
select(data, variables)
选出变量列,多个变量逗号隔开,可以使用variable1:variable2,everything(),start_with("a"),end_with("sth"),contains("sth")
select(data, -variable)
去除变量列
top_n(data, n = number, wt = variable)
列出按某变量最高的n行观测值
pull(data)
从数据框搞一个值出来,用在只有一个变量一个观测值的summarize()函数后面貌似很爽的样子
sample_n(data, replace = TRUE, size = number)
有放回地抽取一个样本容量是size的样本,和rep_sample_n(size = number, replace = TRUE, reps = 1)一个效果,结果中没有replicate这一列了
bind_rows(data1, data2, .id = Variable)
像rbind(),但能作用于数据框
ggplot2
ggplot(data, mapping = aes(x = variable1, y = variable2))
设置绘图区域
geom_point(aes(alpha = number, color, fill, shape, size))
散点图。alpha透明度
geom_jitter(aes(width = number, height = number))
抖动的散点图
geom_smooth(method = lm/glm/.../c(...), se = T/F)
介绍写的是在过度绘图的情况下帮助眼睛看到图案。我觉得就是加拟合的线。se是是否显示置信区间
geom_hline(yintercept = number,color , size = number)
直线
geom_line(data, aes(), size = 1)
geom_histogram(bins = number, binwidth = number, color = "white")
直方图。参数分别是条的数量,条的宽度,条的边界颜色
geom_boxplot(fill = "color")
scale_x_discrete(labels = c( ))
x轴的标签
geom_col(position = "dodge")
条形图,根据分类变量分割条形图在ggplot里aes里加fill = variable,dodge使分割的不堆叠
facet_wrap(~variable, ncol = number)
用在geom_col()后,使分类变量不同类各一个条形图, ncol确定图的列数
geom_line()
折线图
labs(x = "xlab", y = "ylab", title = "your title")
标签
theme(legend.position = "none"/"left"/"right"/"bottom"/"top")
修改各种非数据的图形部分,lengend.position是图例位置
gganimate
Plot + transition_time(Time) +
labs(title = "Time:{frame_time}")
按时间变化的动图
knitr & kableExtra
kable(data, col.names = c("Name1", "Name2", ...), caption, booktabs = T/F, format = "latex")
kable_styling(font_size = number)
基础包们
skim(data)
行数,列数,变量种类
连续型变量:缺失值,平均值,标准差,分位数,直方图
分类型变量:缺失值,是否排序,变量种类,变量计数
gsub(a,b,c)
将字符串c中的a字符用b字符进行替换
cor(data)
协方差矩阵
lm(Y ~ X1 + X2, data)
glm(fomula, data, family= binomial(link = "logit"))
coef(model)
从模型中提取系数,貌似要用summary()后面,反正glm要
levels(Variables)
查看因子型变量水平
predict(model, type)
计算模型的拟合值,我不知道,glm是搞出\(log( \frac{p} {1-p})\)
fitted(model)
glm来说就是直接搞出\(p\)
plogis(value)
plogis(\(log( \frac{p} {1-p})\)) = \(p\)
optim(par, fn, gr = NULL, ..., method = "Nelder-Mead", hessian = FALSE)
搞优化,BFGS啊Nelder Mead (default)啊之类的
paris the vector of initial values for the optimization parameters.fnis the objective function to minimize. Its first argument is always the vector of optimization parameters. Other arguments must be named, and will be passed to fn via the ‘...’ argument to optim. It returns the value of the objective.gris as fn, but, if supplied, returns the gradient vector of the objective....is used to pass named arguments to fn and gr. See section 5.7.methodselects the optimization method. "BFGS" is another possibility.hessiandetermines whether or not the Hessian of the objective should be returned.
nlm(f, p, ..., hessian = FALSE)
搞优化,牛顿法
fis the objective function, exactly like fn for optim. In addition its return value may optionally have ‘gradient’ and ‘hessian’ attributes.pis the vector of initial values for the optimization parameters....is used to pass named arguments to f. See section 5.7.hessiandetermines whether or not the Hessian of the objective should be returned.
moderndive
get_regression_table(model)
结果有估计值,估计值的标准差,检验统计量,p值,置信区间
get_regression_points(model)
结果有ID,\(Y\),\(X_1\),\(X_2\),...,\(\hat{Y}\),\(\epsilon\)
get_correlation(formula = Y ~ X)
相关系数
model.matrix(model)
线性模型的design matrix
infer
[外链图片转存中...(img-N2WRnknG-1600870562542)]
rep_sample_n(data, size = number, replace = TRUE, reps = number)
size是bootstrap样本的大小,与原样本应一致;reps是重复抽取bootstrap样本的次数
specify(data, Y ~ X1 + X2/NULL, success = "A")
确定分析的响应变量和解释变量, success是给比例情况用的,算“A”的比例
generate(data, reps = number, type = "bootstrap" / "permute" / "simulate")
reps是重复抽取样本的次数,即产生了reps个样本容量和原样本一样的样本,然后可以直接calculate不用group_by
calculate(data, stat = c("mean", "median", "sum", "sd", "prop", "count", "diff in means", "diff in medians", "diff in props", "Chisq", "F","slope", "correlation", "t", "z"), order = c("A", "B"), ...)
就infer包的summarize,order决定解释变量中因子的顺序,推断两类中的差或比或t、z统计量时用,...可以传递na.rm之类的参数给mean()之类的
visualize(data, bins = number, obs_stat = x_bar, endpoints = percentile_ci, direction = "between")
就直方图,bins确定条的数量,x_bar是原样本分布的均值(针对要估计的是均值),可以再用summarize算一算bootstrap分布的均值,endpoints和direction用来画区间
get_ci(data, level = 0.95, type = "percentile", point_estimate = NULL)
get_ci(type = "se", point_estimate = x_bar)
算置信区间
janitor
tabyl(data, variable1, variable2, variable3, ...)
就像table()
adorn_percentages(table, denominator = "row"/"col"/"all", na.rm = T/F)
搞表格的百分比
adorn_pct_formatting(table, digits = number, rounding = "half to even"/"half up", affix_sign = T/F)
把搞好的百分比搞得能看,digits表示保留小数位数(默认1),rounding表示小数取舍方法,affix_sign表示是否加百分号
adorn_ns(table, position = "rear"/"front")
在搞好的百分比后或前加原始计数
sjPlot
plot_model(model, type, show.values = T/F, transform = NULL, title, show.p = F)
show.values表示log-odds/odds值是否显示,
show.p表示是否在显著值上标星号,transform表示确定估计运用的函数的字符型向量,默认指数,NULL则是对数,
vline.color垂直的零影响的线的颜色
GGally
pairs()
ggpairs()
MASS
stepAIC()
plotly
plot_ly(data, x = ~ A, y = ~ B, z = ~ C, type = "scatter3d", mode = "markers")
三维图
broom
glance(model)
模型的\(R^2\),调整后的\(R^2\),\(\sigma\),统计量,p值,log似然函数值,AIC,BIC,deviance,df.residual
ellipse
ellipse()
we can generate the following 95% confidence ellipse
本文作者:ZZN而已
本文链接:https://www.cnblogs.com/zerozhao/p/r-function-notes.html
版权声明:本博客所有文章除特别声明外,均采用 CC BY-NC-ND 4.0 许可协议。

浙公网安备 33010602011771号