Select and deselect columns in data.table, and what does "with=" mean?

load package

library(data.table)

demo data

Click for details
input <- if (file.exists("flights14.csv")) {
  "flights14.csv"
} else {
  "https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
}
flights <- fread(input)
flights

select/subset columns

  • flights[ , 1:3]

  • flights[, year:day], in this case, the argumentwith=TRUE is the default in data.table.

  • flights[ , c('year', 'month', 'day')]

  • flights[ , list(year, month, day)]

  • flights[ , .(year, month, day)]

  • select_cols <- c('year', 'month', 'day')flights[ , ..select_cols]

deselect columns

  • flights[ , -c('year', 'month', 'day')], or flights[ , !c('year', 'month', 'day')]
  • flights[ , -(year:day)], or flights[ , !(year:day)] Again, with=TRUE is the default here.

say something more about with() in base R and data.table

Suppose we have a data.frame: DF, and we want to subset all rows where x > 1. In base R, we can do this:

Click for details
DF <- data.frame(x = c(1,1,1,2,2,3,3,3), y = 1:8)
## (1) normal way
DF[DF$x > 1, ] # data.frame needs that ',' as well
#   x y
# 4 2 4
# 5 2 5
# 6 3 6
# 7 3 7
# 8 3 8

## (2) using with()
DF[with(DF, x > 1), ]
#   x y
# 4 2 4
# 5 2 5
# 6 3 6
# 7 3 7
# 8 3 8

In the example(2) above, the x is regarded as a variable or column name when the codes/elements are inside the function with(). In data.table, the argument with= works in the similar way. It is TRUE in default, in which the codes/elements are regarded as variables/column names, hence the argument with=FALSE disables the ability to refer to columns, thereby restoring the “data.frame mode”.

posted @ 2022-05-28 21:21  DaqianLU  阅读(42)  评论(0)    收藏  举报