Decision Tree in R (churn.csv)
Churn
Data Set -> 数据集
- Data Download Link: churn.csv
- The number of records: 1477
- Sixteen Variable can be used for decision tree generation
- 1 Output Variable:
- LEAVER = 'T' if CHURNED = 'Vol
- 'F' if CHURNED = 'InVol' of 'Current'
- 13 Input Variables: LONGDIST International LOCAL DROPPED … to Car_Oner
- train (1,2,3, three values with equal probabilities)
- 1 & 2 for traindata
- 3 for testdata

Step 1: read Data -> 读取数据
# 读取数据
file.choose()
rc <- read.csv("D:\\churn.csv",header=TRUE)
Step 2: Data Exploration -> 数据探究
dim(rc)
# 探索对象的尺寸,即data size和number of variable
str(rc)
# 探索对象内部的结构,即character of variable
attributes(rc)
# 探索对象的属性, 即names和classes
head(rc)
# 显示数据的前一部分
tail(rc)
# 显示数据的后一部分
rc[1:10,]
# 显示数据的1到10列
rc[1:10,"SEX"]
# 显示数据中因素"性别"的1到10列
Step 3: Individual variable Exploration -> 个体数据探究
# 数据摘要、饼图、绘图
summary(rc)
table(rc$CHURNED)
pie(table(rc$CHURNED))
plot(rc$CHURNED)


Step 4: Train and Test Data Set Generation -> 建立训练集合和测试集合
# 定义Output Variable并数据集进行取样,按照train中1 2 3数据,1 2分配为traindata,3分配为testdata
rc$LEAVER <- ifelse(rc$CHURNED =="Vol",T,F)
set.seed(1234)
testdata <- rc[rc$train==3,]
traindata <- rc[!rc$train==3,]
nrow(traindata)
nrow(testdata)
Step 5: Decision Tree Generation -> 建立决策树
# 调入数据包裹party、建立公式、在训练集建立决策树(ctree)、在测试集预测决策树(ctree)、列出原有数据与预测数据的Matrix表
library(party)
myformula <- LEAVER~LONGDIST+International+LOCAL+DROPPED+PAY_MTHD+LocalBillType+LongDistanceBillType+AGE+SEX+STATUS+CHILDREN+Est_Income+Car_Owner
train.tree<-ctree(myformula,data=traindata)
test.pred<-predict(train.tree,newdata=testdata)
table(test.pred,testdata$LEAVER)

Step 6: Printing the Rules and Plotting the Decision Tree -> 印刷规则以及绘制决策树图
# 导出terminal node的分析结果并绘图
print(train.tree)
plot(train.tree)
plot(train.tree,type="simple")


浙公网安备 33010602011771号