Kaggle竞赛:泰坦尼克号灾难数据分
Kaggle竞赛:泰坦尼克号灾难数据分
https://www.kaggle.com/c/titanic
- 目标确定 :根据已有数据预测未知旅客生死
- 数据准备 :
- 数据获取,载入训练集csv、测试集csv
- 数据清洗,补齐或抛弃缺失值,数据类型变换(字符串转数字)
- 数据重构,根据需要重新构造数据(重组数据,构建新特征)
- 数据分析 :
- 描述性分析,画图,直观分析
- 探索性分析,机器学习模型
- 成果输出 :csv文件上传得到正确率和排名
载入库
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
[/code]
# 数据获取
```code
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
[/code]
```code
train.head() # 显示头几行数据
[/code]
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp |
Parch | Ticket | Fare | Cabin | Embarked
---|---|---|---|---|---|---|---|---|---|---|---|---
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0
| A/5 21171 | 7.2500 | NaN | S
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th… |
female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0
| STON/O2. 3101282 | 7.9250 | NaN | S
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0
| 373450 | 8.0500 | NaN | S
```code
test.head()# 显示头几行数据
[/code]
| PassengerId | Pclass | Name | Sex | Age | SibSp | Parch |
Ticket | Fare | Cabin | Embarked
---|---|---|---|---|---|---|---|---|---|---|---
0 | 892 | 3 | Kelly, Mr. James | male | 34.5 | 0 | 0 | 330911
| 7.8292 | NaN | Q
1 | 893 | 3 | Wilkes, Mrs. James (Ellen Needs) | female | 47.0 | 1
| 0 | 363272 | 7.0000 | NaN | S
2 | 894 | 2 | Myles, Mr. Thomas Francis | male | 62.0 | 0 | 0 |
240276 | 9.6875 | NaN | Q
3 | 895 | 3 | Wirz, Mr. Albert | male | 27.0 | 0 | 0 | 315154
| 8.6625 | NaN | S
4 | 896 | 3 | Hirvonen, Mrs. Alexander (Helga E Lindqvist) | female |
22.0 | 1 | 1 | 3101298 | 12.2875 | NaN | S
## 数据概览
```code
train.shape, test.shape # 查看数据的行数,列数
[/code]
```code
((891, 12), (418, 11))
train.info() # 查看具体信息字段
[/code]
```code
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId 891 non-null int64
Survived 891 non-null int64
Pclass 891 non-null int64
Name 891 non-null object
Sex 891 non-null object
Age 714 non-null float64
SibSp 891 non-null int64
Parch 891 non-null int64
Ticket 891 non-null object
Fare 891 non-null float64
Cabin 204 non-null object
Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB
test.info()
[/code]
```code
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 11 columns):
PassengerId 418 non-null int64
Pclass 418 non-null int64
Name 418 non-null object
Sex 418 non-null object
Age 332 non-null float64
SibSp 418 non-null int64
Parch 418 non-null int64
Ticket 418 non-null object
Fare 417 non-null float64
Cabin 91 non-null object
Embarked 418 non-null object
dtypes: float64(2), int64(4), object(5)
memory usage: 36.0+ KB
train.csv 具体数据格式
- PassengerId 乘客ID
- Survived 是否幸存。0遇难,1幸存
- Pclass 船舱等级,1Upper,2Middle,3Lower
- Name 姓名,object——————————
- Sex 性别,object—————————
- Age 年龄 缺失177——m————————
- SibSp 兄弟姐妹及配偶个数
- Parch 父母或子女个数
- Ticket 乘客的船票号,object————————
- Fare 乘客的船票价
- Cabin 乘客所在舱位,object,缺失687———————
- Embarked 乘客登船口岸,object,缺失3————————
train.head() # head()方法查看头部几行信息,如果打train则返回所有数据列表
[/code]
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp |
Parch | Ticket | Fare | Cabin | Embarked
---|---|---|---|---|---|---|---|---|---|---|---|---
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0
| A/5 21171 | 7.2500 | NaN | S
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th… |
female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0
| STON/O2. 3101282 | 7.9250 | NaN | S
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0
| 373450 | 8.0500 | NaN | S
## 数据清洗
### 缺失过多或无关值抛弃
```code
# .loc 通过自定义索引获取数据 , 其中 .loc[:,:]中括号里面逗号前面的表示行,逗号后面的表示列
train2 = train.loc[:,['PassengerId','Survived','Pclass','Sex','Age','SibSp','Parch','Fare']]
test2 = test.loc[:, ['PassengerId','Pclass','Sex','Age','SibSp','Parch','Fare']]
[/code]
```code
train2.head()
[/code]
| PassengerId | Survived | Pclass | Sex | Age | SibSp | Parch |
Fare
---|---|---|---|---|---|---|---|---
0 | 1 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500
1 | 2 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833
2 | 3 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250
3 | 4 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000
4 | 5 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500
```code
test2.head()
[/code]
| PassengerId | Pclass | Sex | Age | SibSp | Parch | Fare
---|---|---|---|---|---|---|---
0 | 892 | 3 | male | 34.5 | 0 | 0 | 7.8292
1 | 893 | 3 | female | 47.0 | 1 | 0 | 7.0000
2 | 894 | 2 | male | 62.0 | 0 | 0 | 9.6875
3 | 895 | 3 | male | 27.0 | 0 | 0 | 8.6625
4 | 896 | 3 | female | 22.0 | 1 | 1 | 12.2875
```code
train2.info(), test2.info()
[/code]
```code
test2.info()
[/code]
```code
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 7 columns):
PassengerId 418 non-null int64
Pclass 418 non-null int64
Sex 418 non-null object
Age 332 non-null float64
SibSp 418 non-null int64
Parch 418 non-null int64
Fare 417 non-null float64
dtypes: float64(2), int64(4), object(1)
memory usage: 22.9+ KB
填充年龄空值
age = train2['Age'].median() # 年龄中位数
age
[/code]
```code
28.0
train2['Age'].isnull() # 空值转bool值
[/code]
```code
0 False
1 False
2 False
3 False
4 False
5 True
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 True
18 False
19 True
20 False
21 False
22 False
23 False
24 False
25 False
26 True
27 False
28 True
29 True
...
861 False
862 False
863 True
864 False
865 False
866 False
867 False
868 True
869 False
870 False
871 False
872 False
873 False
874 False
875 False
876 False
877 False
878 True
879 False
880 False
881 False
882 False
883 False
884 False
885 False
886 False
887 False
888 True
889 False
890 False
Name: Age, Length: 891, dtype: bool
train2.loc[train2['Age'].isnull(), 'Age'] = age # 为train2年龄为空值的填充年龄中位数
[/code]
```code
train2.info()
[/code]
```code
test2.loc[test2['Age'].isnull(), 'Age'] = age # 为test2中年龄为空值的数据填充年龄中位数
[/code]
```code
test2.info()
[/code]
```code
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 7 columns):
PassengerId 418 non-null int64
Pclass 418 non-null int64
Sex 418 non-null object
Age 418 non-null float64
SibSp 418 non-null int64
Parch 418 non-null int64
Fare 417 non-null float64
dtypes: float64(2), int64(4), object(1)
memory usage: 22.9+ KB
填充船票价格空值
#取众数填充船票价格 Fare
Fare = test2['Fare'].mode()
Fare
test2.loc[test['Fare'].isnull(),'Fare'] = Fare[0]
train2.info(),test2.info()
[/code]
```code
train2.head()
[/code]
| PassengerId | Survived | Pclass | Sex | Age | SibSp | Parch |
Fare
---|---|---|---|---|---|---|---|---
0 | 1 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500
1 | 2 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833
2 | 3 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250
3 | 4 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000
4 | 5 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500
## 数据类型转换
```code
train2.dtypes,test2.dtypes # 列数据类型
[/code]
```code
(PassengerId int64
Survived int64
Pclass int64
Sex object
Age float64
SibSp int64
Parch int64
Fare float64
dtype: object, PassengerId int64
Pclass int64
Sex object
Age float64
SibSp int64
Parch int64
Fare float64
dtype: object)
性别转换成整型数据
train2['Sex'] = train2['Sex'].map({'female':0, 'male':1}).astype(int)
test2['Sex'] = test2['Sex'].map({'female': 0, 'male': 1}).astype(int)
train2.head()
[/code]
| PassengerId | Survived | Pclass | Sex | Age | SibSp | Parch |
Fare
---|---|---|---|---|---|---|---|---
0 | 1 | 0 | 3 | 1 | 22.0 | 1 | 0 | 7.2500
1 | 2 | 1 | 1 | 0 | 38.0 | 1 | 0 | 71.2833
2 | 3 | 1 | 3 | 0 | 26.0 | 0 | 0 | 7.9250
3 | 4 | 1 | 1 | 0 | 35.0 | 1 | 0 | 53.1000
4 | 5 | 0 | 3 | 1 | 35.0 | 0 | 0 | 8.0500
## 数据重构
将SibSp、Parch特征构建两个新特征
* 家庭人口总数 familysize
* 是否单身 isalone
```code
train2.loc[:,'SibSp'] #兄妹个数
train2.loc[:,'Parch'] #父母子女个数
train2['familysize'] = train2.loc[:,'SibSp'] + train2.loc[:,'Parch'] + 1
test2['familysize'] = test2.loc[:,'SibSp'] + test2.loc[:,'Parch'] + 1
[/code]
```code
train2.head()
[/code]
| PassengerId | Survived | Pclass | Sex | Age | SibSp | Parch |
Fare | familysize
---|---|---|---|---|---|---|---|---|---
0 | 1 | 0 | 3 | 1 | 22.0 | 1 | 0 | 7.2500 | 2
1 | 2 | 1 | 1 | 0 | 38.0 | 1 | 0 | 71.2833 | 2
2 | 3 | 1 | 3 | 0 | 26.0 | 0 | 0 | 7.9250 | 1
3 | 4 | 1 | 1 | 0 | 35.0 | 1 | 0 | 53.1000 | 2
4 | 5 | 0 | 3 | 1 | 35.0 | 0 | 0 | 8.0500 | 1
```code
train2['isalone'] = 0
train2.loc[train2['familysize'] == 1,'isalone'] = 1
[/code]
```code
train2.head()
[/code]
| PassengerId | Survived | Pclass | Sex | Age | SibSp | Parch |
Fare | familysize | isalone
---|---|---|---|---|---|---|---|---|---|---
0 | 1 | 0 | 3 | 1 | 22.0 | 1 | 0 | 7.2500 | 2 | 0
1 | 2 | 1 | 1 | 0 | 38.0 | 1 | 0 | 71.2833 | 2 | 0
2 | 3 | 1 | 3 | 0 | 26.0 | 0 | 0 | 7.9250 | 1 | 1
3 | 4 | 1 | 1 | 0 | 35.0 | 1 | 0 | 53.1000 | 2 | 0
4 | 5 | 0 | 3 | 1 | 35.0 | 0 | 0 | 8.0500 | 1 | 1
## 数据重构后的最终数据
```code
train3 = train2.loc[:,['PassengerId','Survived','Pclass','Sex','Age','Fare','familysize','isalone']]
train3.head()
test3 = test2.loc[:,['PassengerId','Pclass','Sex','Age','Fare','familysize','isalone']]
test3.head()
[/code]
| PassengerId | Pclass | Sex | Age | Fare | familysize | isalone
---|---|---|---|---|---|---|---
0 | 892 | 3 | 1 | 34.5 | 7.8292 | 1 | NaN
1 | 893 | 3 | 0 | 47.0 | 7.0000 | 2 | NaN
2 | 894 | 2 | 1 | 62.0 | 9.6875 | 1 | NaN
3 | 895 | 3 | 1 | 27.0 | 8.6625 | 1 | NaN
4 | 896 | 3 | 0 | 22.0 | 12.2875 | 3 | NaN
# 数据分析
## 描述性分析
```code
#单身存活率
d = train3[['isalone', 'Survived']].groupby(['isalone']).mean()
d
# d.loc[0,'Survived']
[/code]
| Survived
---|---
isalone |
0 | 0.505650
1 | 0.303538
```code
#单身与否死亡率
plt.bar(
[0,1],
[1-d.loc[0,'Survived'],1-d.loc[1,'Survived']],
0.5,
color='r',
alpha=0.5,
)
plt.xticks([0,1],['notalone','alone'])
plt.show()
[/code]

```code
#男性女性存活率
n = train3[['Sex', 'Survived']].groupby(['Sex']).mean()
n
[/code]
| Survived
---|---
Sex |
0 | 0.742038
1 | 0.188908
```code
# 不同性别死亡率条形图
plt.bar(
[0,1],
[1-n.loc[0,'Survived'],1-n.loc[1,'Survived']],
0.5,
color='g',
alpha=0.7
)
plt.xticks([0,1],['female','male'])
plt.show()
[/code]

```code
#仓位存活率
c = train3[['Pclass', 'Survived']].groupby(['Pclass']).mean()
c
[/code]
| Survived
---|---
Pclass |
1 | 0.629630
2 | 0.472826
3 | 0.242363
```code
#三等仓位死亡率条形图
plt.bar(
[0,1,2],
[1-c.loc[1,'Survived'],1-c.loc[2,'Survived'],1-c.loc[3,'Survived']],
0.5,
color='b',
alpha=0.7
)
plt.xticks([0,1,2],[1,2,3])
plt.show()
[/code]

```code
#年龄存活率
age = train3[['Age', 'Survived']].groupby(['Age']).mean()
age
[/code]
| Survived
---|---
Age |
0.42 | 1.000000
0.67 | 1.000000
0.75 | 1.000000
0.83 | 1.000000
0.92 | 1.000000
1.00 | 0.714286
2.00 | 0.300000
3.00 | 0.833333
4.00 | 0.700000
5.00 | 1.000000
6.00 | 0.666667
7.00 | 0.333333
8.00 | 0.500000
9.00 | 0.250000
10.00 | 0.000000
11.00 | 0.250000
12.00 | 1.000000
13.00 | 1.000000
14.00 | 0.500000
14.50 | 0.000000
15.00 | 0.800000
16.00 | 0.352941
17.00 | 0.461538
18.00 | 0.346154
19.00 | 0.360000
20.00 | 0.200000
20.50 | 0.000000
21.00 | 0.208333
22.00 | 0.407407
23.00 | 0.333333
… | …
44.00 | 0.333333
45.00 | 0.416667
45.50 | 0.000000
46.00 | 0.000000
47.00 | 0.111111
48.00 | 0.666667
49.00 | 0.666667
50.00 | 0.500000
51.00 | 0.285714
52.00 | 0.500000
53.00 | 1.000000
54.00 | 0.375000
55.00 | 0.500000
55.50 | 0.000000
56.00 | 0.500000
57.00 | 0.000000
58.00 | 0.600000
59.00 | 0.000000
60.00 | 0.500000
61.00 | 0.000000
62.00 | 0.500000
63.00 | 1.000000
64.00 | 0.000000
65.00 | 0.000000
66.00 | 0.000000
70.00 | 0.000000
70.50 | 0.000000
71.00 | 0.000000
74.00 | 0.000000
80.00 | 1.000000
88 rows × 1 columns
```code
#不同年龄存活率
plt.figure(2, figsize=(20,5))
plt.bar(
age.index,
age.values,
0.5,
color='r',
alpha=0.7
)
# plt.axis([0,80,0,20])
plt.xticks(age.index,rotation=90)
plt.show()

#票价存活率
fare = train3[['Fare', 'Survived']].groupby(['Fare']).mean()
fare
[/code]
| Survived
---|---
Fare |
0.0000 | 0.066667
4.0125 | 0.000000
5.0000 | 0.000000
6.2375 | 0.000000
6.4375 | 0.000000
6.4500 | 0.000000
6.4958 | 0.000000
6.7500 | 0.000000
6.8583 | 0.000000
6.9500 | 0.000000
6.9750 | 0.500000
7.0458 | 0.000000
7.0500 | 0.000000
7.0542 | 0.000000
7.1250 | 0.000000
7.1417 | 1.000000
7.2250 | 0.250000
7.2292 | 0.266667
7.2500 | 0.076923
7.3125 | 0.000000
7.4958 | 0.333333
7.5208 | 0.000000
7.5500 | 0.250000
7.6292 | 0.000000
7.6500 | 0.250000
7.7250 | 0.000000
7.7292 | 0.000000
7.7333 | 0.500000
7.7375 | 0.500000
7.7417 | 0.000000
… | …
80.0000 | 1.000000
81.8583 | 1.000000
82.1708 | 0.500000
83.1583 | 1.000000
83.4750 | 0.500000
86.5000 | 1.000000
89.1042 | 1.000000
90.0000 | 0.750000
91.0792 | 1.000000
93.5000 | 1.000000
106.4250 | 0.500000
108.9000 | 0.500000
110.8833 | 0.750000
113.2750 | 0.666667
120.0000 | 1.000000
133.6500 | 1.000000
134.5000 | 1.000000
135.6333 | 0.666667
146.5208 | 1.000000
151.5500 | 0.500000
153.4625 | 0.666667
164.8667 | 1.000000
211.3375 | 1.000000
211.5000 | 0.000000
221.7792 | 0.000000
227.5250 | 0.750000
247.5208 | 0.500000
262.3750 | 1.000000
263.0000 | 0.500000
512.3292 | 1.000000
248 rows × 1 columns
```code
plt.figure(2, figsize=(20,5))
plt.bar(
fare.index,
fare.values,
0.5,
color='r',
alpha=0.7
)
# plt.axis([0,80,0,20])
plt.xticks(fare.index,rotation=90)
plt.show()

得出结论
# 单身死亡率70%
jieguo = pd.DataFrame(np.arange(0,418),index=test3.loc[:,'PassengerId'])
jieguo.loc[:,0] = 1
[/code]
```code
jieguo.head()
[/code]
| 0
---|---
PassengerId |
892 | 1
893 | 1
894 | 1
895 | 1
896 | 1
```code
jieguo.loc[test3[test3.loc[:,'isalone'] == 1].loc[:,'PassengerId'].values] = 0 #单身死
[/code]
```code
jieguo.head()
[/code]
| 0
---|---
PassengerId |
892 | 1
893 | 1
894 | 1
895 | 1
896 | 1
## 输出结论
```code
jieguo.to_csv('isalone.csv')
[/code]
```code
#判断:男性全死,女性全活,三等仓全死
new3 = pd.DataFrame(np.arange(0,418),index=test3.loc[:,'PassengerId'].values)
new3[0] = 0 #默认全死
new3.head()
[/code]
| 0
---|---
892 | 0
893 | 0
894 | 0
895 | 0
896 | 0
```code
new3.loc[test3[test3.loc[:,'Sex'] == 0].loc[:,'PassengerId'].values] = 1 #女性活
new3.head()
[/code]
| 0
---|---
892 | 0
893 | 1
894 | 0
895 | 0
896 | 1
```code
new3.loc[test2[test2.loc[:,'Pclass'] == 3].loc[:,'PassengerId'].values] = 0 #三等仓死
new3.head()
[/code]
| 0
---|---
892 | 0
893 | 0
894 | 0
895 | 0
896 | 0
```code
#写入csv上传
new3.to_csv('cangwei-xingbie.csv')#判断:男性全死,女性全活,三等仓全死
[/code]
## 机器学习建模
```code
train3.head()
[/code]
| PassengerId | Survived | Pclass | Sex | Age | Fare | familysize
| isalone
---|---|---|---|---|---|---|---|---
0 | 1 | 0 | 3 | 1 | 22.0 | 7.2500 | 2 | 0
1 | 2 | 1 | 1 | 0 | 38.0 | 71.2833 | 2 | 0
2 | 3 | 1 | 3 | 0 | 26.0 | 7.9250 | 1 | 1
3 | 4 | 1 | 1 | 0 | 35.0 | 53.1000 | 2 | 0
4 | 5 | 0 | 3 | 1 | 35.0 | 8.0500 | 1 | 1
```code
from sklearn import neighbors,datasets
[/code]
```code
x = train3.loc[:,['Pclass','Sex','familysize']]
y = train3.loc[:,'Survived'] #生死
clf = neighbors.KNeighborsClassifier(n_neighbors = 20)
clf.fit(x,y) #knn训练
clf
[/code]
```code
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=20, p=2,
weights='uniform')
#knn预测
z = clf.predict(test3.loc[:,['Pclass','Sex','familysize']])
z
[/code]
```code
array([0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0,
0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0,
0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0,
1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1,
0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0,
1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1,
0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1,
0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1,
1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0,
1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
1, 0, 0, 0], dtype=int64)
# 构造表
s = np.arange(892, 1310)
s
results = pd.DataFrame(z, index=s)
results.head()
[/code]
| 0
---|---
892 | 0
893 | 0
894 | 0
895 | 0
896 | 1
```code
# 写入csv上传
results.to_csv('Titanic_knn.csv')
[/code]


浙公网安备 33010602011771号