Decision Tree in R (churn.csv)

Churn

Data Set -> 数据集

  • The number of records: 1477
  • Sixteen Variable can be used for decision tree generation
  • 1 Output Variable: 
    • LEAVER = 'T' if CHURNED = 'Vol
    •                  'F' if CHURNED = 'InVol' of 'Current'
  • 13 Input Variables: LONGDIST International LOCAL DROPPED … to Car_Oner
  • train (1,2,3, three values with equal probabilities)
    • 1 & 2 for traindata
    • 3 for testdata

Step 1: read Data -> 读取数据

# 读取数据
file.choose()
rc <- read.csv("D:\\churn.csv",header=TRUE)

Step 2: Data Exploration -> 数据探究

dim(rc) 
# 探索对象的尺寸,即data size和number of variable
str(rc) 
# 探索对象内部的结构,即character of variable
attributes(rc)
# 探索对象的属性, 即names和classes
head(rc) 
# 显示数据的前一部分
tail(rc) 
# 显示数据的后一部分
rc[1:10,] 
# 显示数据的1到10列
rc[1:10,"SEX"] 
# 显示数据中因素"性别"的1到10列

Step 3: Individual variable Exploration -> 个体数据探究

# 数据摘要、饼图、绘图
summary(rc)
table(rc$CHURNED)
pie(table(rc$CHURNED))
plot(rc$CHURNED)

Step 4: Train and Test Data Set Generation -> 建立训练集合和测试集合

# 定义Output Variable并数据集进行取样,按照train中1 2 3数据,1 2分配为traindata,3分配为testdata
rc$LEAVER <- ifelse(rc$CHURNED =="Vol",T,F)
set.seed(1234)
testdata <- rc[rc$train==3,]
traindata <- rc[!rc$train==3,]
nrow(traindata)
nrow(testdata)

 Step 5:  Decision Tree Generation -> 建立决策树

# 调入数据包裹party、建立公式、在训练集建立决策树(ctree)、在测试集预测决策树(ctree)、列出原有数据与预测数据的Matrix表
library(party)
myformula <- LEAVER~LONGDIST+International+LOCAL+DROPPED+PAY_MTHD+LocalBillType+LongDistanceBillType+AGE+SEX+STATUS+CHILDREN+Est_Income+Car_Owner
train.tree<-ctree(myformula,data=traindata)
test.pred<-predict(train.tree,newdata=testdata)
table(test.pred,testdata$LEAVER)

 

Step 6: Printing the Rules and Plotting the Decision Tree -> 印刷规则以及绘制决策树图

# 导出terminal node的分析结果并绘图
print(train.tree)
plot(train.tree)
plot(train.tree,type="simple")

posted @ 2013-06-08 21:54  jinyulogin  阅读(1718)  评论(0)    收藏  举报