R function notes

刚刚沉迷 md 时候的第一篇了算是，有点怀念啊。

目录 Contents

Tidyverse
- dplyr
- ggplot2
gganimate
knitr & kableExtra
基础包们
moderndive
infer
janitor
sjPlot
GGally
MASS
plotly
broom
ellipse

Tidyverse

dplyr

glimpse(data)
查看数据变量类型及前几个值

summarize(data, Variable = function(data, na.rm = TRUE))
总结数据，可用向量到单值的函数

gather(data, key = Key, value = Value)
使原变量名成为新变量 Key 的一列值，原变量的观测值成为新变量 Value 的一列值

spread(data, key = Key, value = Value)
与 gather 作用相反

filter(data, conditions of variables)
选出符合条件的观测值行，多个条件逗号隔开

group_by(data, categorical variables)
观测值按分类变量成组，分类变量逗号隔开

ungroup(data)
干死上面的那个函数

mutate(data, Variable = blabla)
给数据添加新变量列

rename(data, Variable = variable)
给变量重命名

arrange(data, variable)
按变量升序排列观测值

arrange(data, desc(variable))
按变量降序排列观测值

inner_join(data1, data2, by = c("variable1" = "variable2"))
按变量合并数据，合成后只剩共有的，可以按多个变量合并

select(data, variables)
选出变量列，多个变量逗号隔开，可以使用 variable1:variable2，everything()，start_with("a")，end_with("sth")，contains("sth")

select(data, -variable)
去除变量列

top_n(data, n = number, wt = variable)
列出按某变量最高的 n 行观测值

pull(data)
从数据框搞一个值出来，用在只有一个变量一个观测值的 summarize () 函数后面貌似很爽的样子

sample_n(data, replace = TRUE, size = number)
有放回地抽取一个样本容量是 size 的样本，和 rep_sample_n (size = number, replace = TRUE, reps = 1) 一个效果，结果中没有 replicate 这一列了

bind_rows(data1, data2, .id = Variable)
像 rbind (), 但能作用于数据框

ggplot2

ggplot(data, mapping = aes(x = variable1, y = variable2))
设置绘图区域

geom_point(aes(alpha = number, color, fill, shape, size))
散点图。alpha 透明度

geom_jitter(aes(width = number, height = number))
抖动的散点图

geom_smooth(method = lm/glm/.../c(...), se = T/F)
介绍写的是在过度绘图的情况下帮助眼睛看到图案。我觉得就是加拟合的线。se 是是否显示置信区间

geom_hline(yintercept = number,color , size = number)
直线

geom_line(data, aes(), size = 1)

geom_histogram(bins = number, binwidth = number, color = "white")
直方图。参数分别是条的数量，条的宽度，条的边界颜色

geom_boxplot(fill = "color")

scale_x_discrete(labels = c( ))
x 轴的标签

geom_col(position = "dodge")
条形图，根据分类变量分割条形图在 ggplot 里 aes 里加 fill = variable，dodge 使分割的不堆叠

facet_wrap(~variable, ncol = number)
用在 geom_col() 后，使分类变量不同类各一个条形图，ncol 确定图的列数

geom_line()
折线图

labs(x = "xlab", y = "ylab", title = "your title")
标签

theme(legend.position = "none"/"left"/"right"/"bottom"/"top")
修改各种非数据的图形部分，lengend.position 是图例位置

gganimate

Plot + transition_time(Time) +
labs(title = "Time:{frame_time}")
按时间变化的动图

knitr & kableExtra

kable(data, col.names = c("Name1", "Name2", ...), caption, booktabs = T/F, format = "latex")

kable_styling(font_size = number)

基础包们

skim(data)
行数，列数，变量种类
连续型变量：缺失值，平均值，标准差，分位数，直方图
分类型变量：缺失值，是否排序，变量种类，变量计数

gsub(a,b,c）
将字符串 c 中的 a 字符用 b 字符进行替换

cor(data)
协方差矩阵

lm(Y ~ X1 + X2, data)

glm(fomula, data, family= binomial(link = "logit"))

coef(model)
从模型中提取系数，貌似要用 summary () 后面，反正 glm 要

levels(Variables)
查看因子型变量水平

predict(model, type)
计算模型的拟合值，我不知道，glm 是搞出 $log( \frac{p} {1-p})$

fitted(model)
glm 来说就是直接搞出 $p$

plogis(value)
plogis( $log( \frac{p} {1-p})$ ) = $p$

optim(par, fn, gr = NULL, ..., method = "Nelder-Mead", hessian = FALSE)
搞优化，BFGS 啊 Nelder Mead (default) 啊之类的

par is the vector of initial values for the optimization parameters.
fn is the objective function to minimize. Its first argument is always the vector of optimization parameters. Other arguments must be named, and will be passed to fn via the ‘...’ argument to optim. It returns the value of the objective.
gr is as fn, but, if supplied, returns the gradient vector of the objective.
... is used to pass named arguments to fn and gr. See section 5.7.
method selects the optimization method. "BFGS" is another possibility.
hessian determines whether or not the Hessian of the objective should be returned.

nlm(f, p, ..., hessian = FALSE)
搞优化，牛顿法

f is the objective function, exactly like fn for optim. In addition its return value may optionally have ‘gradient’ and ‘hessian’ attributes.
p is the vector of initial values for the optimization parameters.
... is used to pass named arguments to f. See section 5.7.
hessian determines whether or not the Hessian of the objective should be returned.

moderndive

get_regression_table(model)
结果有估计值，估计值的标准差，检验统计量，p 值，置信区间

get_regression_points(model)
结果有 ID， $Y$ ， $X_1$ ， $X_2$ ，...， $\hat{Y}$ ， $\epsilon$

get_correlation(formula = Y ~ X)
相关系数

model.matrix(model)
线性模型的 design matrix

infer

[外链图片转存中...(img-N2WRnknG-1600870562542)]

rep_sample_n(data, size = number, replace = TRUE, reps = number)
size 是 bootstrap 样本的大小，与原样本应一致；reps 是重复抽取 bootstrap 样本的次数

specify(data, Y ~ X1 + X2/NULL, success = "A")
确定分析的响应变量和解释变量，success 是给比例情况用的，算 “A” 的比例

generate(data, reps = number, type = "bootstrap" / "permute" / "simulate")
reps 是重复抽取样本的次数，即产生了 reps 个样本容量和原样本一样的样本，然后可以直接 calculate 不用 group_by

calculate(data, stat = c("mean", "median", "sum", "sd", "prop", "count", "diff in means", "diff in medians", "diff in props", "Chisq", "F","slope", "correlation", "t", "z"), order = c("A", "B"), ...)
就 infer 包的 summarize，order 决定解释变量中因子的顺序，推断两类中的差或比或 t、z 统计量时用，... 可以传递 na.rm 之类的参数给 mean () 之类的

visualize(data, bins = number, obs_stat = x_bar, endpoints = percentile_ci, direction = "between")
就直方图，bins 确定条的数量，x_bar 是原样本分布的均值（针对要估计的是均值），可以再用 summarize 算一算 bootstrap 分布的均值，endpoints 和 direction 用来画区间

get_ci(data, level = 0.95, type = "percentile", point_estimate = NULL)
get_ci(type = "se", point_estimate = x_bar)
算置信区间

janitor

tabyl(data, variable1, variable2, variable3, ...)
就像 table ()

adorn_percentages(table, denominator = "row"/"col"/"all", na.rm = T/F)
搞表格的百分比

adorn_pct_formatting(table, digits = number, rounding = "half to even"/"half up", affix_sign = T/F)
把搞好的百分比搞得能看，digits 表示保留小数位数（默认 1），rounding 表示小数取舍方法，affix_sign 表示是否加百分号

adorn_ns(table, position = "rear"/"front")
在搞好的百分比后或前加原始计数

sjPlot

plot_model(model, type, show.values = T/F, transform = NULL, title, show.p = F)
show.values 表示 log-odds/odds 值是否显示，
show.p 表示是否在显著值上标星号，transform 表示确定估计运用的函数的字符型向量，默认指数，NULL 则是对数，
vline.color 垂直的零影响的线的颜色

GGally

pairs()

ggpairs()

MASS

stepAIC()

plotly

plot_ly(data, x = ~ A, y = ~ B, z = ~ C, type = "scatter3d", mode = "markers")
三维图

broom

glance(model)
模型的 $R^2$ ，调整后的 $R^2$ ， $\sigma$ ，统计量，p 值，log 似然函数值，AIC，BIC，deviance，df.residual

ellipse

ellipse()
we can generate the following 95% confidence ellipse

posted @ 2021-10-08 08:45 ZZN而已阅读(112) 评论(0) 收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

Profile Outline

ZZN而已

All models are wrong, but some are useful.

R function notes

Tidyverse

dplyr

ggplot2

gganimate

knitr & kableExtra

基础包们

moderndive

infer

janitor

sjPlot

GGally

MASS

plotly

broom

ellipse

我的标签

随笔分类 (37)

可能有用的链接

阅读排行榜

评论排行榜

推荐排行榜

最新评论