数据分析与挖掘练习

1.0 背景

该数据集是澳大利亚某公司无人机送货的记录(2018年8月之前),主要包括以下的列:

  • 'Id' : 记录的ID
  • 'Drone Type' : 无人机的类别分 1类 2类 3类
  • 'Post Type' : 运送的类别 0为普通运送 1为速运
  • 'Package Weight' :包裹的重量
  • 'Origin Region' :出发地的区域代码
  • 'Destination Region' :目的地的区域代码
  • 'Origin Latitude' :出发纬度
  • 'Origin Longitude' :出发经度
  • 'Destination Latitude' :目的地纬度
  • 'Destination Longitude' :目的地经度
  • 'Journey Distance' :运送距离
  • 'Departure Date' :出发日期
  • 'Departure Time' :出发时间
  • 'Travel Time' :飞行时间
  • 'Delivery Time' :到达时间
  • 'Delivery Fare' :运送费用
pd.options.display.max_rows = 10

2.0 载入包和数据

#loading library
import pandas as pd
import re
import matplotlib.pyplot as plt
#import seaborn as sns  !pip intall seaborn
import scipy.stats as st
import numpy as np
import math
from math import *
from datetime import datetime,timedelta

任务1:载入名为‘data.csv’的数据

data = pd.read_csv('data.csv')

DataFrame
Series

type(data)

pandas.core.frame.DataFrame

3.0 数据初步探索

任务2:找出数据有多少行列

data.shape

(37903, 16)

任务3:查看列的统计信息

提示:describe()

data.describe()
Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Travel Time Delivery Fare
count 37893.000000 37883.000000 37903.000000 37893.000000 37893.000000 37903.000000 37903.000000 37903.000000 37903.000000 37903.000000 37863.000000 37874.000000
mean 1.699285 0.298709 25.669901 20.476684 20.452722 -37.728867 145.423058 -37.722054 145.434035 221.954150 208.794518 126.814976
std 0.779845 0.457698 12.107150 11.501110 11.509311 1.899183 6.923993 1.895621 6.909055 116.604355 107.612447 59.314445
min 1.000000 0.000000 5.001000 1.000000 1.000000 -39.006941 -148.337157 -39.006941 -147.691902 0.664000 7.420000 54.020000
25% 1.000000 0.000000 15.199000 11.000000 11.000000 -38.443034 143.965002 -38.431293 143.951543 131.044500 125.165000 97.440000
50% 2.000000 0.000000 25.446000 20.000000 20.000000 -37.707244 145.423386 -37.700695 145.450794 209.796000 196.370000 120.045000
75% 2.000000 1.000000 35.953500 30.000000 30.000000 -37.094433 147.170334 -37.080256 147.216886 302.052000 281.250000 145.800000
max 3.000000 1.000000 55.992000 40.000000 40.000000 38.986998 148.450576 38.989473 148.450576 556.637000 545.460000 1217.690000

任务4:找出每个列名称

data.columns

Index(['Id', 'Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
'Destination Region', 'Origin Latitude', 'Origin Longitude',
'Destination Latitude', 'Destination Longitude', 'Journey Distance',
'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
'Delivery Fare'],
dtype='object')

任务5:找出每个列的属性

是Obeject 还是 float

data.info()
data['Drone Type'] = data['Drone Type'].astype('str')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37903 entries, 0 to 37902
Data columns (total 16 columns):
Id 37878 non-null object
Drone Type 37893 non-null float64
Post Type 37883 non-null float64
Package Weight 37903 non-null float64
Origin Region 37893 non-null float64
Destination Region 37893 non-null float64
Origin Latitude 37903 non-null float64
Origin Longitude 37903 non-null float64
Destination Latitude 37903 non-null float64
Destination Longitude 37903 non-null float64
Journey Distance 37903 non-null float64
Departure Date 37903 non-null object
Departure Time 37903 non-null object
Travel Time 37863 non-null float64
Delivery Time 37903 non-null object
Delivery Fare 37874 non-null float64
dtypes: float64(12), object(4)
memory usage: 4.6+ MB

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37903 entries, 0 to 37902
Data columns (total 16 columns):
Id 37878 non-null object
Drone Type 37903 non-null object
Post Type 37883 non-null float64
Package Weight 37903 non-null float64
Origin Region 37893 non-null float64
Destination Region 37893 non-null float64
Origin Latitude 37903 non-null float64
Origin Longitude 37903 non-null float64
Destination Latitude 37903 non-null float64
Destination Longitude 37903 non-null float64
Journey Distance 37903 non-null float64
Departure Date 37903 non-null object
Departure Time 37903 non-null object
Travel Time 37863 non-null float64
Delivery Time 37903 non-null object
Delivery Fare 37874 non-null float64
dtypes: float64(11), object(5)
memory usage: 4.6+ MB

任务6:找出数据的前5行和后5行

data.head()

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
0 ID1645282128 2.0 0.0 21.686 19.0 38.0 -37.089338 144.429529 -37.639134 142.891391 149.212 2018-01-16 09:38:17 140.19 11:58:28 99.25
1 ID1697620764 nan 0.0 39.075 15.0 15.0 -38.481935 146.009567 -38.585528 146.199827 20.185 2018-02-10 04:28:17 22.84 4:51:07 149.04
2 ID1543933503 2.0 0.0 7.243 33.0 28.0 -38.754167 144.509664 -38.242224 147.855342 296.975 2018-05-05 01:38:03 272.52 6:10:34 141.48
3 ID1756517608 2.0 0.0 13.383 10.0 38.0 -37.240526 147.568019 -37.687178 142.991188 407.396 2018-06-11 11:43:04 371.40 17:54:27 122.82
4 ID1832325834 2.0 0.0 8.123 1.0 8.0 -38.143985 143.798292 -38.548315 144.769228 95.974 2018-03-16 14:50:25 92.51 16:22:55 111.97
5 ID1802448576 2.0 0.0 32.859 2.0 28.0 -37.421211 148.044072 -38.159627 148.194048 83.250 2018-05-15 16:35:50 81.12 17:56:57 113.88
6 ID1940231408 1.0 0.0 20.616 29.0 36.0 -37.173949 143.140662 -37.021605 145.197043 183.363 2018-04-01 19:31:12 184.22 22:35:25 85.60
7 ID1299303958 2.0 0.0 44.577 36.0 31.0 -37.123190 145.236196 -37.667199 143.877650 134.543 2018-05-01 18:39:36 127.05 20:46:38 114.22
8 ID1752722028 1.0 0.0 15.363 20.0 30.0 -38.850561 148.317253 -38.024914 144.823938 318.132 2018-05-27 14:48:17 314.64 20:02:55 87.39
9 ID5995243590 1.0 1.0 36.190 18.0 28.0 -38.070189 142.950207 -37.996817 148.026520 445.106 2018-06-17 12:53:02 437.52 20:10:33 142.95
data.tail()

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
37898 NaN 3.0 1.0 27.153 39.0 16.0 -38.446310 148.292498 -36.739777 143.604529 454.968 2018-07-23 08:29:19 366.09 14:35:24 188.49
37899 ID5862552991 1.0 1.0 40.363 9.0 38.0 -38.983710 145.320518 -37.673908 142.879230 258.259 2018-06-26 15:55:37 256.70 20:12:18 122.98
37900 ID5339104082 1.0 1.0 35.955 13.0 32.0 -38.292301 147.562013 -36.605285 148.293183 198.597 2018-03-19 16:41:10 198.97 20:00:08 118.47
37901 ID5468787866 2.0 1.0 29.566 33.0 23.0 -38.853243 144.508346 -37.727691 145.662270 160.816 2018-02-26 04:22:30 150.58 6:53:04 161.96
37902 ID1448126768 3.0 0.0 44.070 36.0 34.0 -37.129313 145.266426 -38.428477 143.341632 222.687 2018-07-07 08:01:42 182.71 11:04:24 144.41

任务7:找出所有列的缺失值个数并且按照多到少排列

隐藏任务:可视化缺失值的列

data.isnull().sum().sort_values(ascending=False)

Travel Time 40
Delivery Fare 29
Id 25
Post Type 20
Destination Region 10
Origin Region 10
Delivery Time 0
Departure Time 0
Departure Date 0
Journey Distance 0
Destination Longitude 0
Destination Latitude 0
Origin Longitude 0
Origin Latitude 0
Package Weight 0
Drone Type 0
dtype: int64

count = {}
for col in data.columns:
    count_null = data[col].isnull().sum()
    count[col] = count_null
for i,j in sorted(count.items(),key = lambda s: s[1], reverse=True):
    print('列名:%s,存在缺失值 %s 个'%(i,j))

列名:Travel Time,存在缺失值 40 个
列名:Delivery Fare,存在缺失值 29 个
列名:Id,存在缺失值 25 个
列名:Post Type,存在缺失值 20 个
列名:Drone Type,存在缺失值 10 个
列名:Origin Region,存在缺失值 10 个
列名:Destination Region,存在缺失值 10 个
列名:Package Weight,存在缺失值 0 个
列名:Origin Latitude,存在缺失值 0 个
列名:Origin Longitude,存在缺失值 0 个
列名:Destination Latitude,存在缺失值 0 个
列名:Destination Longitude,存在缺失值 0 个
列名:Journey Distance,存在缺失值 0 个
列名:Departure Date,存在缺失值 0 个
列名:Departure Time,存在缺失值 0 个
列名:Delivery Time,存在缺失值 0 个

任务8:找出所有至少含有一个缺失值的行,并统计有多少行

data.isnull().any(axis=1)  # 判断至少有一个缺失值

0 False
1 True
2 False
3 False
4 False
...
37898 True
37899 False
37900 False
37901 False
37902 False
Length: 37903, dtype: bool

data.drop(data.iloc[0,2])

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
1 ID1697620764 NaN 0.0 39.075 15.0 15.0 -38.481935 146.009567 -38.585528 146.199827 20.185 2018-02-10 04:28:17 22.84 4:51:07 149.04
2 ID1543933503 2.0 0.0 7.243 33.0 28.0 -38.754167 144.509664 -38.242224 147.855342 296.975 2018-05-05 01:38:03 272.52 6:10:34 141.48
3 ID1756517608 2.0 0.0 13.383 10.0 38.0 -37.240526 147.568019 -37.687178 142.991188 407.396 2018-06-11 11:43:04 371.40 17:54:27 122.82
4 ID1832325834 2.0 0.0 8.123 1.0 8.0 -38.143985 143.798292 -38.548315 144.769228 95.974 2018-03-16 14:50:25 92.51 16:22:55 111.97
5 ID1802448576 2.0 0.0 32.859 2.0 28.0 -37.421211 148.044072 -38.159627 148.194048 83.250 2018-05-15 16:35:50 81.12 17:56:57 113.88
6 ID1940231408 1.0 0.0 20.616 29.0 36.0 -37.173949 143.140662 -37.021605 145.197043 183.363 2018-04-01 19:31:12 184.22 22:35:25 85.60
7 ID1299303958 2.0 0.0 44.577 36.0 31.0 -37.123190 145.236196 -37.667199 143.877650 134.543 2018-05-01 18:39:36 127.05 20:46:38 114.22
8 ID1752722028 1.0 0.0 15.363 20.0 30.0 -38.850561 148.317253 -38.024914 144.823938 318.132 2018-05-27 14:48:17 314.64 20:02:55 87.39
9 ID5995243590 1.0 1.0 36.190 18.0 28.0 -38.070189 142.950207 -37.996817 148.026520 445.106 2018-06-17 12:53:02 437.52 20:10:33 142.95
10 ID1483358088 2.0 0.0 23.172 13.0 27.0 -38.225456 147.425515 -37.642798 147.124104 70.051 2018-03-19 09:59:10 69.30 11:08:27 96.95
11 ID1626798395 2.0 0.0 19.754 23.0 26.0 -37.625368 145.838281 -36.789955 147.133916 147.791 2018-02-28 17:40:59 138.92 19:59:54 117.48
12 ID5277549009 3.0 1.0 12.807 4.0 6.0 -36.855984 142.929596 -36.906838 145.696986 246.465 2018-03-26 07:55:48 201.49 11:17:17 173.32
13 ID1950928883 2.0 0.0 22.332 33.0 19.0 -38.894115 144.457143 -37.173740 144.152105 193.365 2018-06-28 16:05:55 179.73 19:05:38 117.67
14 ID5143738648 2.0 1.0 25.880 33.0 5.0 -38.872372 144.606034 -37.553304 145.120753 153.580 2018-05-09 08:33:29 144.10 10:57:34 132.98
15 ID5132897910 2.0 1.0 38.691 7.0 15.0 -38.844622 144.093195 -38.476630 145.849992 158.102 2018-01-05 15:55:00 148.15 18:23:09 147.73
16 ID1290889802 1.0 0.0 30.742 19.0 14.0 -37.178059 144.403991 -37.713867 146.382965 184.783 2018-06-05 13:03:43 185.60 16:09:18 84.82
17 ID5226355535 1.0 1.0 18.055 10.0 18.0 -37.141728 147.256091 -37.983127 143.290272 362.227 2018-05-04 07:42:51 357.32 13:40:10 116.35
18 ID1898978312 1.0 0.0 5.986 23.0 18.0 -37.706960 145.718119 -37.983127 143.182464 224.997 2018-05-23 12:53:12 224.51 16:37:42 82.47
19 ID5284908619 2.0 1.0 30.664 13.0 14.0 -38.250238 147.366610 -37.774617 146.503568 92.371 2018-07-11 12:52:57 89.29 14:22:14 143.96
20 ID1585556406 3.0 0.0 24.942 3.0 39.0 -38.322643 145.505910 -38.453191 148.300405 244.251 2018-03-14 22:16:33 199.74 1:36:17 167.45
21 ID1901962779 1.0 0.0 34.012 27.0 37.0 -37.516090 146.969053 -38.852488 147.816987 166.236 2018-07-17 16:26:55 167.65 19:14:34 88.98
22 ID5590279060 1.0 1.0 55.229 7.0 6.0 -38.876145 143.911302 -36.875012 145.759049 275.633 2018-05-01 12:42:19 273.52 17:15:50 688.24
23 ID1473718059 3.0 0.0 40.741 33.0 17.0 -38.756634 144.375564 -38.817727 147.071495 234.014 2018-06-21 01:04:15 191.66 4:15:54 165.37
24 ID5551646734 3.0 1.0 15.419 35.0 21.0 -36.922413 146.362349 -37.240137 143.768208 233.068 2018-01-05 07:19:48 190.91 10:30:42 174.52
25 ID1772122934 2.0 0.0 10.663 29.0 1.0 -37.019910 142.798295 -38.358124 143.943955 179.929 2018-02-27 04:42:03 167.70 7:29:44 124.95
26 ID1987608852 1.0 0.0 20.685 9.0 35.0 -38.970092 145.435801 -37.068624 146.317717 225.349 2018-05-15 15:35:10 224.85 19:20:00 80.66
27 ID1249352358 2.0 0.0 10.272 8.0 34.0 -38.485641 144.522135 -38.505484 143.311788 105.471 2018-02-05 06:13:29 101.02 7:54:30 102.09
28 ID1611614450 1.0 0.0 7.373 16.0 38.0 -36.600877 143.566811 -37.804113 142.793139 150.483 2018-06-18 14:01:42 152.40 16:34:06 82.11
29 ID1262379299 2.0 0.0 31.358 13.0 16.0 -38.195911 147.436921 -36.734698 143.764279 362.941 2018-04-24 22:40:54 331.59 4:12:29 147.21
30 ID5498216777 2.0 1.0 6.383 40.0 29.0 -37.692167 147.890721 -37.273398 142.951899 438.698 2018-05-09 10:31:10 399.43 17:10:35 152.50
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37873 NaN 1.0 0.0 39.038 7.0 22.0 -38.664403 143.702592 -36.670478 144.312065 228.361 2018-02-02 18:03:25 227.77 21:51:11 87.87
37874 NaN 1.0 0.0 7.171 14.0 1.0 -37.589382 146.559422 -38.229669 143.710836 260.125 2018-02-03 22:11:31 258.51 2:30:01 106.47
37875 NaN 1.0 0.0 39.141 5.0 1.0 -37.491746 145.168879 -38.168680 143.936761 131.959 2018-04-19 22:54:02 134.48 1:08:30 94.83
37876 NaN 2.0 0.0 42.330 25.0 5.0 -36.618297 147.643165 -37.563856 145.038804 254.069 2018-01-27 11:06:09 234.09 15:00:14 116.10
37877 NaN 1.0 1.0 43.965 25.0 38.0 -36.542082 147.863736 -37.549355 142.980000 448.104 2018-06-14 22:15:50 440.42 5:36:15 149.26
37878 ID1525565031 2.0 0.0 27.560 7.0 35.0 -38.689638 144.093996 -37.046945 146.460389 276.894 2018-02-22 23:43:48 254.53 3:58:19 135.68
37879 NaN 1.0 0.0 18.540 36.0 18.0 -36.974023 145.036046 -38.015640 142.955101 217.299 2018-05-28 07:14:50 217.06 10:51:53 76.31
37880 ID1272392458 1.0 0.0 44.226 40.0 3.0 -37.740012 147.768225 -38.276119 145.403536 215.813 2018-02-21 21:28:22 215.63 1:03:59 105.89
37881 ID1909131399 2.0 0.0 6.415 25.0 27.0 -36.585246 147.853062 -37.700285 147.027318 144.135 2018-01-16 08:26:33 135.64 10:42:11 100.45
37882 ID5420355345 3.0 1.0 30.471 3.0 15.0 -38.345519 145.542105 -38.712933 145.977537 55.772 2018-07-22 00:13:54 50.94 1:04:50 183.73
37883 ID5164753016 2.0 1.0 6.553 5.0 15.0 -37.625311 145.097611 -38.630132 146.040376 139.018 2018-01-14 16:51:56 131.06 19:02:59 145.71
37884 ID1122103211 1.0 0.0 37.850 38.0 10.0 -37.510578 142.826103 -37.337327 147.315144 397.278 2018-04-27 14:54:35 391.24 21:25:49 102.88
37885 NaN 1.0 0.0 13.544 34.0 38.0 -38.528632 143.386140 -37.561137 142.835312 118.028 2018-04-09 11:48:46 120.99 13:49:45 69.88
37886 ID1970402579 2.0 0.0 38.656 1.0 32.0 -38.171542 143.871582 -36.756484 148.322553 423.585 2018-03-18 01:01:33 385.90 7:27:26 142.19
37887 ID1388385049 1.0 0.0 23.699 10.0 10.0 -37.352794 147.476662 -37.104700 147.417027 28.118 2018-05-24 00:41:08 33.99 1:15:07 83.77
37888 NaN 1.0 1.0 34.923 36.0 28.0 -37.044972 144.920592 -38.218799 148.050871 305.307 2018-07-20 03:13:54 302.23 8:16:07 130.70
37889 ID1281653747 1.0 0.0 34.130 8.0 36.0 -38.434373 144.730220 -37.047801 145.309517 162.554 2018-02-06 06:27:55 164.08 9:11:59 69.44
37890 ID5349085772 1.0 1.0 18.286 22.0 14.0 -36.720129 144.588398 -37.695893 146.430793 196.154 2018-02-15 17:04:44 196.60 20:21:19 121.11
37891 ID5972337482 1.0 1.0 11.538 33.0 2.0 -38.836802 144.357057 -37.549931 148.306793 374.024 2018-05-26 02:07:50 368.73 8:16:33 141.06
37892 NaN 3.0 1.0 5.416 20.0 33.0 -38.959090 148.294700 -38.930545 144.661887 314.514 2018-04-27 20:53:02 255.21 1:08:14 185.96
37893 ID1539650034 2.0 0.0 34.355 8.0 39.0 -38.520278 144.408786 -38.447195 148.416066 349.251 2018-05-09 07:26:49 319.33 12:46:08 121.30
37894 NaN 2.0 0.0 41.232 38.0 39.0 -37.657406 142.777301 -38.622040 148.366529 500.901 2018-07-02 08:59:29 455.14 16:34:37 139.79
37895 ID1796943211 1.0 0.0 44.341 23.0 24.0 -37.777223 146.024184 -38.913981 142.913934 299.552 2018-02-20 05:08:12 296.66 10:04:51 113.70
37896 ID5429883749 2.0 1.0 17.798 11.0 40.0 -38.045551 146.736254 -37.633007 147.639273 91.711 2018-05-03 10:19:32 88.70 11:48:14 130.51
37897 NaN 1.0 0.0 8.865 9.0 2.0 -38.839254 145.226776 -37.695101 148.251214 293.394 2018-03-11 12:18:21 290.70 17:09:02 88.71
37898 NaN 3.0 1.0 27.153 39.0 16.0 -38.446310 148.292498 -36.739777 143.604529 454.968 2018-07-23 08:29:19 366.09 14:35:24 188.49
37899 ID5862552991 1.0 1.0 40.363 9.0 38.0 -38.983710 145.320518 -37.673908 142.879230 258.259 2018-06-26 15:55:37 256.70 20:12:18 122.98
37900 ID5339104082 1.0 1.0 35.955 13.0 32.0 -38.292301 147.562013 -36.605285 148.293183 198.597 2018-03-19 16:41:10 198.97 20:00:08 118.47
37901 ID5468787866 2.0 1.0 29.566 33.0 23.0 -38.853243 144.508346 -37.727691 145.662270 160.816 2018-02-26 04:22:30 150.58 6:53:04 161.96
37902 ID1448126768 3.0 0.0 44.070 36.0 34.0 -37.129313 145.266426 -38.428477 143.341632 222.687 2018-07-07 08:01:42 182.71 11:04:24 144.41

37902 rows × 16 columns

# axis=1针对的是行;=0针对的是列
data[data.isnull().any(axis=1)].shape
#(data.isnull().sum(axis=1) >= 1).sum()
data[data.isnull().any(axis=1)].shape
data.isnull().any(axis=1)

0 False
1 True
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
...
37873 True
37874 True
37875 True
37876 True
37877 True
37878 False
37879 True
37880 False
37881 False
37882 False
37883 False
37884 False
37885 True
37886 False
37887 False
37888 True
37889 False
37890 False
37891 False
37892 True
37893 False
37894 True
37895 False
37896 False
37897 True
37898 True
37899 False
37900 False
37901 False
37902 False
dtype: bool

4.0 数据清洗

任务9: 填补 'Id'列的空值

任务9.1 统计‘id’列有多少个空值

data['Id'].isnull().sum()

25

任务9.2 找出所有‘id’为空的行

data[data['Id'].isnull()]

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
37844 NaN 1.0 0.0 22.498 30.0 15.0 -37.885792 144.875305 -38.680341 145.874064 124.252 2018-01-28 13:07:09 127.02 15:14:10 74.69
37845 NaN 3.0 0.0 32.300 16.0 36.0 -36.571169 143.741010 -36.993435 144.983048 120.297 2018-03-26 02:34:49 101.88 4:16:41 162.20
37846 NaN 3.0 0.0 18.601 38.0 29.0 -37.694132 142.851548 -37.058014 142.873698 70.838 2018-03-02 20:33:11 62.83 21:36:00 142.10
37850 NaN 2.0 1.0 28.203 18.0 21.0 -38.139260 143.345778 -37.279348 143.604189 98.391 2018-04-17 03:28:41 94.68 5:03:21 152.30
37851 NaN 2.0 1.0 45.696 28.0 1.0 -38.152835 147.793072 -38.162103 144.043047 328.220 2018-03-25 13:50:57 300.50 18:51:27 167.39
37852 NaN 1.0 0.0 27.143 38.0 32.0 -37.613135 142.854194 -36.713765 148.383062 500.500 2018-05-12 08:54:52 491.13 17:05:59 97.44
37854 NaN 1.0 0.0 13.002 27.0 3.0 -37.428594 147.056992 -38.383631 145.590528 167.005 2018-02-20 03:35:43 168.39 6:24:06 92.12
37857 NaN 1.0 0.0 19.468 5.0 31.0 -37.570514 145.253281 -37.824303 143.862472 125.716 2018-04-20 11:48:14 128.43 13:56:39 69.86
37860 NaN 1.0 0.0 10.647 21.0 5.0 -37.239046 143.524656 -37.546884 145.204154 152.434 2018-06-13 23:05:39 154.29 1:39:56 95.65
37861 NaN 2.0 0.0 40.775 22.0 19.0 -36.678761 144.345069 -37.167099 144.257043 54.922 2018-03-27 04:31:43 55.75 5:27:28 120.70
37863 NaN 1.0 0.0 35.410 9.0 16.0 -39.006941 145.406988 -36.525200 143.517906 322.399 2018-07-21 06:48:39 318.77 12:07:25 88.14
37866 NaN 1.0 1.0 40.151 14.0 34.0 -37.644061 146.625820 -38.518928 143.369714 301.447 2018-04-14 07:29:13 298.50 12:27:43 118.89
37869 NaN 3.0 1.0 44.559 33.0 10.0 -38.733722 144.474460 -37.220132 147.498884 314.323 2018-04-18 04:57:37 255.06 9:12:40 201.89
37873 NaN 1.0 0.0 39.038 7.0 22.0 -38.664403 143.702592 -36.670478 144.312065 228.361 2018-02-02 18:03:25 227.77 21:51:11 87.87
37874 NaN 1.0 0.0 7.171 14.0 1.0 -37.589382 146.559422 -38.229669 143.710836 260.125 2018-02-03 22:11:31 258.51 2:30:01 106.47
37875 NaN 1.0 0.0 39.141 5.0 1.0 -37.491746 145.168879 -38.168680 143.936761 131.959 2018-04-19 22:54:02 134.48 1:08:30 94.83
37876 NaN 2.0 0.0 42.330 25.0 5.0 -36.618297 147.643165 -37.563856 145.038804 254.069 2018-01-27 11:06:09 234.09 15:00:14 116.10
37877 NaN 1.0 1.0 43.965 25.0 38.0 -36.542082 147.863736 -37.549355 142.980000 448.104 2018-06-14 22:15:50 440.42 5:36:15 149.26
37879 NaN 1.0 0.0 18.540 36.0 18.0 -36.974023 145.036046 -38.015640 142.955101 217.299 2018-05-28 07:14:50 217.06 10:51:53 76.31
37885 NaN 1.0 0.0 13.544 34.0 38.0 -38.528632 143.386140 -37.561137 142.835312 118.028 2018-04-09 11:48:46 120.99 13:49:45 69.88
37888 NaN 1.0 1.0 34.923 36.0 28.0 -37.044972 144.920592 -38.218799 148.050871 305.307 2018-07-20 03:13:54 302.23 8:16:07 130.70
37892 NaN 3.0 1.0 5.416 20.0 33.0 -38.959090 148.294700 -38.930545 144.661887 314.514 2018-04-27 20:53:02 255.21 1:08:14 185.96
37894 NaN 2.0 0.0 41.232 38.0 39.0 -37.657406 142.777301 -38.622040 148.366529 500.901 2018-07-02 08:59:29 455.14 16:34:37 139.79
37897 NaN 1.0 0.0 8.865 9.0 2.0 -38.839254 145.226776 -37.695101 148.251214 293.394 2018-03-11 12:18:21 290.70 17:09:02 88.71
37898 NaN 3.0 1.0 27.153 39.0 16.0 -38.446310 148.292498 -36.739777 143.604529 454.968 2018-07-23 08:29:19 366.09 14:35:24 188.49

9.2.1 删除 除ID列之外其余数据重复的行

data[
    ['Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
       'Destination Region', 'Origin Latitude', 'Origin Longitude',
       'Destination Latitude', 'Destination Longitude', 'Journey Distance',
       'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
       'Delivery Fare']
]

Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
0 2.0 0.0 21.686 19.0 38.0 -37.089338 144.429529 -37.639134 142.891391 149.212 2018-01-16 09:38:17 140.19 11:58:28 99.25
1 NaN 0.0 39.075 15.0 15.0 -38.481935 146.009567 -38.585528 146.199827 20.185 2018-02-10 04:28:17 22.84 4:51:07 149.04
2 2.0 0.0 7.243 33.0 28.0 -38.754167 144.509664 -38.242224 147.855342 296.975 2018-05-05 01:38:03 272.52 6:10:34 141.48
3 2.0 0.0 13.383 10.0 38.0 -37.240526 147.568019 -37.687178 142.991188 407.396 2018-06-11 11:43:04 371.40 17:54:27 122.82
4 2.0 0.0 8.123 1.0 8.0 -38.143985 143.798292 -38.548315 144.769228 95.974 2018-03-16 14:50:25 92.51 16:22:55 111.97
5 2.0 0.0 32.859 2.0 28.0 -37.421211 148.044072 -38.159627 148.194048 83.250 2018-05-15 16:35:50 81.12 17:56:57 113.88
6 1.0 0.0 20.616 29.0 36.0 -37.173949 143.140662 -37.021605 145.197043 183.363 2018-04-01 19:31:12 184.22 22:35:25 85.60
7 2.0 0.0 44.577 36.0 31.0 -37.123190 145.236196 -37.667199 143.877650 134.543 2018-05-01 18:39:36 127.05 20:46:38 114.22
8 1.0 0.0 15.363 20.0 30.0 -38.850561 148.317253 -38.024914 144.823938 318.132 2018-05-27 14:48:17 314.64 20:02:55 87.39
9 1.0 1.0 36.190 18.0 28.0 -38.070189 142.950207 -37.996817 148.026520 445.106 2018-06-17 12:53:02 437.52 20:10:33 142.95
10 2.0 0.0 23.172 13.0 27.0 -38.225456 147.425515 -37.642798 147.124104 70.051 2018-03-19 09:59:10 69.30 11:08:27 96.95
11 2.0 0.0 19.754 23.0 26.0 -37.625368 145.838281 -36.789955 147.133916 147.791 2018-02-28 17:40:59 138.92 19:59:54 117.48
12 3.0 1.0 12.807 4.0 6.0 -36.855984 142.929596 -36.906838 145.696986 246.465 2018-03-26 07:55:48 201.49 11:17:17 173.32
13 2.0 0.0 22.332 33.0 19.0 -38.894115 144.457143 -37.173740 144.152105 193.365 2018-06-28 16:05:55 179.73 19:05:38 117.67
14 2.0 1.0 25.880 33.0 5.0 -38.872372 144.606034 -37.553304 145.120753 153.580 2018-05-09 08:33:29 144.10 10:57:34 132.98
15 2.0 1.0 38.691 7.0 15.0 -38.844622 144.093195 -38.476630 145.849992 158.102 2018-01-05 15:55:00 148.15 18:23:09 147.73
16 1.0 0.0 30.742 19.0 14.0 -37.178059 144.403991 -37.713867 146.382965 184.783 2018-06-05 13:03:43 185.60 16:09:18 84.82
17 1.0 1.0 18.055 10.0 18.0 -37.141728 147.256091 -37.983127 143.290272 362.227 2018-05-04 07:42:51 357.32 13:40:10 116.35
18 1.0 0.0 5.986 23.0 18.0 -37.706960 145.718119 -37.983127 143.182464 224.997 2018-05-23 12:53:12 224.51 16:37:42 82.47
19 2.0 1.0 30.664 13.0 14.0 -38.250238 147.366610 -37.774617 146.503568 92.371 2018-07-11 12:52:57 89.29 14:22:14 143.96
20 3.0 0.0 24.942 3.0 39.0 -38.322643 145.505910 -38.453191 148.300405 244.251 2018-03-14 22:16:33 199.74 1:36:17 167.45
21 1.0 0.0 34.012 27.0 37.0 -37.516090 146.969053 -38.852488 147.816987 166.236 2018-07-17 16:26:55 167.65 19:14:34 88.98
22 1.0 1.0 55.229 7.0 6.0 -38.876145 143.911302 -36.875012 145.759049 275.633 2018-05-01 12:42:19 273.52 17:15:50 688.24
23 3.0 0.0 40.741 33.0 17.0 -38.756634 144.375564 -38.817727 147.071495 234.014 2018-06-21 01:04:15 191.66 4:15:54 165.37
24 3.0 1.0 15.419 35.0 21.0 -36.922413 146.362349 -37.240137 143.768208 233.068 2018-01-05 07:19:48 190.91 10:30:42 174.52
25 2.0 0.0 10.663 29.0 1.0 -37.019910 142.798295 -38.358124 143.943955 179.929 2018-02-27 04:42:03 167.70 7:29:44 124.95
26 1.0 0.0 20.685 9.0 35.0 -38.970092 145.435801 -37.068624 146.317717 225.349 2018-05-15 15:35:10 224.85 19:20:00 80.66
27 2.0 0.0 10.272 8.0 34.0 -38.485641 144.522135 -38.505484 143.311788 105.471 2018-02-05 06:13:29 101.02 7:54:30 102.09
28 1.0 0.0 7.373 16.0 38.0 -36.600877 143.566811 -37.804113 142.793139 150.483 2018-06-18 14:01:42 152.40 16:34:06 82.11
29 2.0 0.0 31.358 13.0 16.0 -38.195911 147.436921 -36.734698 143.764279 362.941 2018-04-24 22:40:54 331.59 4:12:29 147.21
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37873 1.0 0.0 39.038 7.0 22.0 -38.664403 143.702592 -36.670478 144.312065 228.361 2018-02-02 18:03:25 227.77 21:51:11 87.87
37874 1.0 0.0 7.171 14.0 1.0 -37.589382 146.559422 -38.229669 143.710836 260.125 2018-02-03 22:11:31 258.51 2:30:01 106.47
37875 1.0 0.0 39.141 5.0 1.0 -37.491746 145.168879 -38.168680 143.936761 131.959 2018-04-19 22:54:02 134.48 1:08:30 94.83
37876 2.0 0.0 42.330 25.0 5.0 -36.618297 147.643165 -37.563856 145.038804 254.069 2018-01-27 11:06:09 234.09 15:00:14 116.10
37877 1.0 1.0 43.965 25.0 38.0 -36.542082 147.863736 -37.549355 142.980000 448.104 2018-06-14 22:15:50 440.42 5:36:15 149.26
37878 2.0 0.0 27.560 7.0 35.0 -38.689638 144.093996 -37.046945 146.460389 276.894 2018-02-22 23:43:48 254.53 3:58:19 135.68
37879 1.0 0.0 18.540 36.0 18.0 -36.974023 145.036046 -38.015640 142.955101 217.299 2018-05-28 07:14:50 217.06 10:51:53 76.31
37880 1.0 0.0 44.226 40.0 3.0 -37.740012 147.768225 -38.276119 145.403536 215.813 2018-02-21 21:28:22 215.63 1:03:59 105.89
37881 2.0 0.0 6.415 25.0 27.0 -36.585246 147.853062 -37.700285 147.027318 144.135 2018-01-16 08:26:33 135.64 10:42:11 100.45
37882 3.0 1.0 30.471 3.0 15.0 -38.345519 145.542105 -38.712933 145.977537 55.772 2018-07-22 00:13:54 50.94 1:04:50 183.73
37883 2.0 1.0 6.553 5.0 15.0 -37.625311 145.097611 -38.630132 146.040376 139.018 2018-01-14 16:51:56 131.06 19:02:59 145.71
37884 1.0 0.0 37.850 38.0 10.0 -37.510578 142.826103 -37.337327 147.315144 397.278 2018-04-27 14:54:35 391.24 21:25:49 102.88
37885 1.0 0.0 13.544 34.0 38.0 -38.528632 143.386140 -37.561137 142.835312 118.028 2018-04-09 11:48:46 120.99 13:49:45 69.88
37886 2.0 0.0 38.656 1.0 32.0 -38.171542 143.871582 -36.756484 148.322553 423.585 2018-03-18 01:01:33 385.90 7:27:26 142.19
37887 1.0 0.0 23.699 10.0 10.0 -37.352794 147.476662 -37.104700 147.417027 28.118 2018-05-24 00:41:08 33.99 1:15:07 83.77
37888 1.0 1.0 34.923 36.0 28.0 -37.044972 144.920592 -38.218799 148.050871 305.307 2018-07-20 03:13:54 302.23 8:16:07 130.70
37889 1.0 0.0 34.130 8.0 36.0 -38.434373 144.730220 -37.047801 145.309517 162.554 2018-02-06 06:27:55 164.08 9:11:59 69.44
37890 1.0 1.0 18.286 22.0 14.0 -36.720129 144.588398 -37.695893 146.430793 196.154 2018-02-15 17:04:44 196.60 20:21:19 121.11
37891 1.0 1.0 11.538 33.0 2.0 -38.836802 144.357057 -37.549931 148.306793 374.024 2018-05-26 02:07:50 368.73 8:16:33 141.06
37892 3.0 1.0 5.416 20.0 33.0 -38.959090 148.294700 -38.930545 144.661887 314.514 2018-04-27 20:53:02 255.21 1:08:14 185.96
37893 2.0 0.0 34.355 8.0 39.0 -38.520278 144.408786 -38.447195 148.416066 349.251 2018-05-09 07:26:49 319.33 12:46:08 121.30
37894 2.0 0.0 41.232 38.0 39.0 -37.657406 142.777301 -38.622040 148.366529 500.901 2018-07-02 08:59:29 455.14 16:34:37 139.79
37895 1.0 0.0 44.341 23.0 24.0 -37.777223 146.024184 -38.913981 142.913934 299.552 2018-02-20 05:08:12 296.66 10:04:51 113.70
37896 2.0 1.0 17.798 11.0 40.0 -38.045551 146.736254 -37.633007 147.639273 91.711 2018-05-03 10:19:32 88.70 11:48:14 130.51
37897 1.0 0.0 8.865 9.0 2.0 -38.839254 145.226776 -37.695101 148.251214 293.394 2018-03-11 12:18:21 290.70 17:09:02 88.71
37898 3.0 1.0 27.153 39.0 16.0 -38.446310 148.292498 -36.739777 143.604529 454.968 2018-07-23 08:29:19 366.09 14:35:24 188.49
37899 1.0 1.0 40.363 9.0 38.0 -38.983710 145.320518 -37.673908 142.879230 258.259 2018-06-26 15:55:37 256.70 20:12:18 122.98
37900 1.0 1.0 35.955 13.0 32.0 -38.292301 147.562013 -36.605285 148.293183 198.597 2018-03-19 16:41:10 198.97 20:00:08 118.47
37901 2.0 1.0 29.566 33.0 23.0 -38.853243 144.508346 -37.727691 145.662270 160.816 2018-02-26 04:22:30 150.58 6:53:04 161.96
37902 3.0 0.0 44.070 36.0 34.0 -37.129313 145.266426 -38.428477 143.341632 222.687 2018-07-07 08:01:42 182.71 11:04:24 144.41

37903 rows × 15 columns

# drop_duplicates返回一个dataframe,重复的行会标为False
# data_1 = data.drop_duplicates(data[['Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
#        'Destination Region', 'Origin Latitude', 'Origin Longitude',
#        'Destination Latitude', 'Destination Longitude', 'Journey Distance',
#        'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
#        'Delivery Fare']])
# 2
data_1 = data.drop_duplicates(['Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
    'Destination Region', 'Origin Latitude', 'Origin Longitude',
    'Destination Latitude', 'Destination Longitude', 'Journey Distance',
    'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
    'Delivery Fare'])
data_1

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
0 ID1645282128 2.0 0.0 21.686 19.0 38.0 -37.089338 144.429529 -37.639134 142.891391 149.212 2018-01-16 09:38:17 140.19 11:58:28 99.25
1 ID1697620764 NaN 0.0 39.075 15.0 15.0 -38.481935 146.009567 -38.585528 146.199827 20.185 2018-02-10 04:28:17 22.84 4:51:07 149.04
2 ID1543933503 2.0 0.0 7.243 33.0 28.0 -38.754167 144.509664 -38.242224 147.855342 296.975 2018-05-05 01:38:03 272.52 6:10:34 141.48
3 ID1756517608 2.0 0.0 13.383 10.0 38.0 -37.240526 147.568019 -37.687178 142.991188 407.396 2018-06-11 11:43:04 371.40 17:54:27 122.82
4 ID1832325834 2.0 0.0 8.123 1.0 8.0 -38.143985 143.798292 -38.548315 144.769228 95.974 2018-03-16 14:50:25 92.51 16:22:55 111.97
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37839 ID1879423081 2.0 0.0 17.081 20.0 14.0 -38.860358 148.174855 -37.562224 146.443991 209.278 2018-02-15 12:09:15 193.98 15:23:13 121.33
37840 ID5705840841 1.0 1.0 35.164 29.0 23.0 -37.250331 142.837244 -37.739285 145.963420 281.403 2018-01-01 00:07:35 279.10 4:46:41 136.58
37841 ID1276239209 3.0 0.0 36.704 12.0 11.0 -36.578568 145.273744 -38.305557 146.997297 245.269 2018-01-24 06:48:05 200.54 10:08:37 146.69
37842 ID1432868583 3.0 0.0 13.195 37.0 21.0 -38.755802 147.744770 -37.487866 143.585293 390.599 2018-04-07 18:57:52 315.28 0:13:08 170.52
37889 ID1281653747 1.0 0.0 34.130 8.0 36.0 -38.434373 144.730220 -37.047801 145.309517 162.554 2018-02-06 06:27:55 164.08 9:11:59 69.44

37844 rows × 16 columns

9.2.2 设置ID列相同,但其余数据不重复的ID为np.nan

# 返回一个布尔型的series,表示id是否重复行
data_1['Id'].duplicated()

0 False
1 False
2 False
3 False
4 False
...
37839 False
37840 False
37841 False
37842 False
37889 True
Name: Id, Length: 37844, dtype: bool

(data_1['Id'].duplicated()) & ~(data_1[['Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
       'Destination Region', 'Origin Latitude', 'Origin Longitude',
       'Destination Latitude', 'Destination Longitude', 'Journey Distance',
       'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
       'Delivery Fare']].duplicated())

0 False
1 False
2 False
3 False
4 False
...
37839 False
37840 False
37841 False
37842 False
37889 True
Length: 37844, dtype: bool

data_1.loc[(data_1['Id'].duplicated()) & ~(data_1[['Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
       'Destination Region', 'Origin Latitude', 'Origin Longitude',
       'Destination Latitude', 'Destination Longitude', 'Journey Distance',
       'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
       'Delivery Fare']].duplicated()),'Id'] = np.nan

data_1['Id'].value_counts()

ID1858018960 1
ID1792956453 1
ID1574575344 1
ID1146475421 1
ID1897230811 1
..
ID5349642032 1
ID5451065609 1
ID5131869013 1
ID1550669799 1
ID5260585195 1
Name: Id, Length: 37843, dtype: int64

任务9.3 想出空值的填补的2种方法,并且选一种应用

#方法1 :随机数
import random
# 'Id'+str(random.randint(1000000000,2000000000)) in data_1['Id']
#方法2 :随机数填补

def fill_id(Id):
    Id_out = Id
    while Id_out in data_1['Id'].tolist():
        Id_out = 'ID'+str(random.randint(1000000000,2000000000))
        return Id_out
data_1.loc[(data_1['Post Type']==0)&(data_1['Id'].isnull()),'Id'].apply(fill_id)

37889 ID1117118084
Name: Id, dtype: object

data_1.loc[(data_1['Post Type']==0)&(data_1['Id'].isnull()),'Id']

37889 NaN
Name: Id, dtype: object

data_1['Id']

0 ID1645282128
1 ID1697620764
2 ID1543933503
3 ID1756517608
4 ID1832325834
...
37839 ID1879423081
37840 ID5705840841
37841 ID1276239209
37842 ID1432868583
37889 NaN
Name: Id, Length: 37844, dtype: object

任务9.4 检查‘id’是否还有空值

data_1['Id'].isnull().sum()

1

任务10:找出所有重复的id

data[data.Id.duplicated()].Id

37843 ID1874340610
37845 NaN
37846 NaN
37847 ID5156350605
37848 ID1176413101
...
37898 NaN
37899 ID5862552991
37900 ID5339104082
37901 ID5468787866
37902 ID1448126768
Name: Id, Length: 59, dtype: object

data['Id'].value_counts() >= 2

ID5281864060 True
ID1796943211 True
ID1877344172 True
ID5122284320 True
ID1238297934 True
...
ID5672029782 False
ID1114364309 False
ID5495523518 False
ID1874532678 False
ID5260585195 False
Name: Id, Length: 37843, dtype: bool

任务11:删除重复行

data.duplicated()

0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
...
37873 False
37874 False
37875 False
37876 False
37877 False
37878 True
37879 False
37880 True
37881 True
37882 True
37883 True
37884 True
37885 False
37886 True
37887 True
37888 False
37889 False
37890 True
37891 True
37892 False
37893 True
37894 False
37895 True
37896 True
37897 False
37898 False
37899 True
37900 True
37901 True
37902 True
dtype: bool

data.drop_duplicates()

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
0 ID1645282128 2.0 0.0 21.686 19.0 38.0 -37.089338 144.429529 -37.639134 142.891391 149.212 2018-01-16 09:38:17 140.19 11:58:28 99.25
1 ID1697620764 NaN 0.0 39.075 15.0 15.0 -38.481935 146.009567 -38.585528 146.199827 20.185 2018-02-10 04:28:17 22.84 4:51:07 149.04
2 ID1543933503 2.0 0.0 7.243 33.0 28.0 -38.754167 144.509664 -38.242224 147.855342 296.975 2018-05-05 01:38:03 272.52 6:10:34 141.48
3 ID1756517608 2.0 0.0 13.383 10.0 38.0 -37.240526 147.568019 -37.687178 142.991188 407.396 2018-06-11 11:43:04 371.40 17:54:27 122.82
4 ID1832325834 2.0 0.0 8.123 1.0 8.0 -38.143985 143.798292 -38.548315 144.769228 95.974 2018-03-16 14:50:25 92.51 16:22:55 111.97
5 ID1802448576 2.0 0.0 32.859 2.0 28.0 -37.421211 148.044072 -38.159627 148.194048 83.250 2018-05-15 16:35:50 81.12 17:56:57 113.88
6 ID1940231408 1.0 0.0 20.616 29.0 36.0 -37.173949 143.140662 -37.021605 145.197043 183.363 2018-04-01 19:31:12 184.22 22:35:25 85.60
7 ID1299303958 2.0 0.0 44.577 36.0 31.0 -37.123190 145.236196 -37.667199 143.877650 134.543 2018-05-01 18:39:36 127.05 20:46:38 114.22
8 ID1752722028 1.0 0.0 15.363 20.0 30.0 -38.850561 148.317253 -38.024914 144.823938 318.132 2018-05-27 14:48:17 314.64 20:02:55 87.39
9 ID5995243590 1.0 1.0 36.190 18.0 28.0 -38.070189 142.950207 -37.996817 148.026520 445.106 2018-06-17 12:53:02 437.52 20:10:33 142.95
10 ID1483358088 2.0 0.0 23.172 13.0 27.0 -38.225456 147.425515 -37.642798 147.124104 70.051 2018-03-19 09:59:10 69.30 11:08:27 96.95
11 ID1626798395 2.0 0.0 19.754 23.0 26.0 -37.625368 145.838281 -36.789955 147.133916 147.791 2018-02-28 17:40:59 138.92 19:59:54 117.48
12 ID5277549009 3.0 1.0 12.807 4.0 6.0 -36.855984 142.929596 -36.906838 145.696986 246.465 2018-03-26 07:55:48 201.49 11:17:17 173.32
13 ID1950928883 2.0 0.0 22.332 33.0 19.0 -38.894115 144.457143 -37.173740 144.152105 193.365 2018-06-28 16:05:55 179.73 19:05:38 117.67
14 ID5143738648 2.0 1.0 25.880 33.0 5.0 -38.872372 144.606034 -37.553304 145.120753 153.580 2018-05-09 08:33:29 144.10 10:57:34 132.98
15 ID5132897910 2.0 1.0 38.691 7.0 15.0 -38.844622 144.093195 -38.476630 145.849992 158.102 2018-01-05 15:55:00 148.15 18:23:09 147.73
16 ID1290889802 1.0 0.0 30.742 19.0 14.0 -37.178059 144.403991 -37.713867 146.382965 184.783 2018-06-05 13:03:43 185.60 16:09:18 84.82
17 ID5226355535 1.0 1.0 18.055 10.0 18.0 -37.141728 147.256091 -37.983127 143.290272 362.227 2018-05-04 07:42:51 357.32 13:40:10 116.35
18 ID1898978312 1.0 0.0 5.986 23.0 18.0 -37.706960 145.718119 -37.983127 143.182464 224.997 2018-05-23 12:53:12 224.51 16:37:42 82.47
19 ID5284908619 2.0 1.0 30.664 13.0 14.0 -38.250238 147.366610 -37.774617 146.503568 92.371 2018-07-11 12:52:57 89.29 14:22:14 143.96
20 ID1585556406 3.0 0.0 24.942 3.0 39.0 -38.322643 145.505910 -38.453191 148.300405 244.251 2018-03-14 22:16:33 199.74 1:36:17 167.45
21 ID1901962779 1.0 0.0 34.012 27.0 37.0 -37.516090 146.969053 -38.852488 147.816987 166.236 2018-07-17 16:26:55 167.65 19:14:34 88.98
22 ID5590279060 1.0 1.0 55.229 7.0 6.0 -38.876145 143.911302 -36.875012 145.759049 275.633 2018-05-01 12:42:19 273.52 17:15:50 688.24
23 ID1473718059 3.0 0.0 40.741 33.0 17.0 -38.756634 144.375564 -38.817727 147.071495 234.014 2018-06-21 01:04:15 191.66 4:15:54 165.37
24 ID5551646734 3.0 1.0 15.419 35.0 21.0 -36.922413 146.362349 -37.240137 143.768208 233.068 2018-01-05 07:19:48 190.91 10:30:42 174.52
25 ID1772122934 2.0 0.0 10.663 29.0 1.0 -37.019910 142.798295 -38.358124 143.943955 179.929 2018-02-27 04:42:03 167.70 7:29:44 124.95
26 ID1987608852 1.0 0.0 20.685 9.0 35.0 -38.970092 145.435801 -37.068624 146.317717 225.349 2018-05-15 15:35:10 224.85 19:20:00 80.66
27 ID1249352358 2.0 0.0 10.272 8.0 34.0 -38.485641 144.522135 -38.505484 143.311788 105.471 2018-02-05 06:13:29 101.02 7:54:30 102.09
28 ID1611614450 1.0 0.0 7.373 16.0 38.0 -36.600877 143.566811 -37.804113 142.793139 150.483 2018-06-18 14:01:42 152.40 16:34:06 82.11
29 ID1262379299 2.0 0.0 31.358 13.0 16.0 -38.195911 147.436921 -36.734698 143.764279 362.941 2018-04-24 22:40:54 331.59 4:12:29 147.21
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37839 ID1879423081 2.0 0.0 17.081 20.0 14.0 -38.860358 148.174855 -37.562224 146.443991 209.278 2018-02-15 12:09:15 193.98 15:23:13 121.33
37840 ID5705840841 1.0 1.0 35.164 29.0 23.0 -37.250331 142.837244 -37.739285 145.963420 281.403 2018-01-01 00:07:35 279.10 4:46:41 136.58
37841 ID1276239209 3.0 0.0 36.704 12.0 11.0 -36.578568 145.273744 -38.305557 146.997297 245.269 2018-01-24 06:48:05 200.54 10:08:37 146.69
37842 ID1432868583 3.0 0.0 13.195 37.0 21.0 -38.755802 147.744770 -37.487866 143.585293 390.599 2018-04-07 18:57:52 315.28 0:13:08 170.52
37844 NaN 1.0 0.0 22.498 30.0 15.0 -37.885792 144.875305 -38.680341 145.874064 124.252 2018-01-28 13:07:09 127.02 15:14:10 74.69
37845 NaN 3.0 0.0 32.300 16.0 36.0 -36.571169 143.741010 -36.993435 144.983048 120.297 2018-03-26 02:34:49 101.88 4:16:41 162.20
37846 NaN 3.0 0.0 18.601 38.0 29.0 -37.694132 142.851548 -37.058014 142.873698 70.838 2018-03-02 20:33:11 62.83 21:36:00 142.10
37850 NaN 2.0 1.0 28.203 18.0 21.0 -38.139260 143.345778 -37.279348 143.604189 98.391 2018-04-17 03:28:41 94.68 5:03:21 152.30
37851 NaN 2.0 1.0 45.696 28.0 1.0 -38.152835 147.793072 -38.162103 144.043047 328.220 2018-03-25 13:50:57 300.50 18:51:27 167.39
37852 NaN 1.0 0.0 27.143 38.0 32.0 -37.613135 142.854194 -36.713765 148.383062 500.500 2018-05-12 08:54:52 491.13 17:05:59 97.44
37854 NaN 1.0 0.0 13.002 27.0 3.0 -37.428594 147.056992 -38.383631 145.590528 167.005 2018-02-20 03:35:43 168.39 6:24:06 92.12
37857 NaN 1.0 0.0 19.468 5.0 31.0 -37.570514 145.253281 -37.824303 143.862472 125.716 2018-04-20 11:48:14 128.43 13:56:39 69.86
37860 NaN 1.0 0.0 10.647 21.0 5.0 -37.239046 143.524656 -37.546884 145.204154 152.434 2018-06-13 23:05:39 154.29 1:39:56 95.65
37861 NaN 2.0 0.0 40.775 22.0 19.0 -36.678761 144.345069 -37.167099 144.257043 54.922 2018-03-27 04:31:43 55.75 5:27:28 120.70
37863 NaN 1.0 0.0 35.410 9.0 16.0 -39.006941 145.406988 -36.525200 143.517906 322.399 2018-07-21 06:48:39 318.77 12:07:25 88.14
37866 NaN 1.0 1.0 40.151 14.0 34.0 -37.644061 146.625820 -38.518928 143.369714 301.447 2018-04-14 07:29:13 298.50 12:27:43 118.89
37869 NaN 3.0 1.0 44.559 33.0 10.0 -38.733722 144.474460 -37.220132 147.498884 314.323 2018-04-18 04:57:37 255.06 9:12:40 201.89
37873 NaN 1.0 0.0 39.038 7.0 22.0 -38.664403 143.702592 -36.670478 144.312065 228.361 2018-02-02 18:03:25 227.77 21:51:11 87.87
37874 NaN 1.0 0.0 7.171 14.0 1.0 -37.589382 146.559422 -38.229669 143.710836 260.125 2018-02-03 22:11:31 258.51 2:30:01 106.47
37875 NaN 1.0 0.0 39.141 5.0 1.0 -37.491746 145.168879 -38.168680 143.936761 131.959 2018-04-19 22:54:02 134.48 1:08:30 94.83
37876 NaN 2.0 0.0 42.330 25.0 5.0 -36.618297 147.643165 -37.563856 145.038804 254.069 2018-01-27 11:06:09 234.09 15:00:14 116.10
37877 NaN 1.0 1.0 43.965 25.0 38.0 -36.542082 147.863736 -37.549355 142.980000 448.104 2018-06-14 22:15:50 440.42 5:36:15 149.26
37879 NaN 1.0 0.0 18.540 36.0 18.0 -36.974023 145.036046 -38.015640 142.955101 217.299 2018-05-28 07:14:50 217.06 10:51:53 76.31
37885 NaN 1.0 0.0 13.544 34.0 38.0 -38.528632 143.386140 -37.561137 142.835312 118.028 2018-04-09 11:48:46 120.99 13:49:45 69.88
37888 NaN 1.0 1.0 34.923 36.0 28.0 -37.044972 144.920592 -38.218799 148.050871 305.307 2018-07-20 03:13:54 302.23 8:16:07 130.70
37889 ID1281653747 1.0 0.0 34.130 8.0 36.0 -38.434373 144.730220 -37.047801 145.309517 162.554 2018-02-06 06:27:55 164.08 9:11:59 69.44
37892 NaN 3.0 1.0 5.416 20.0 33.0 -38.959090 148.294700 -38.930545 144.661887 314.514 2018-04-27 20:53:02 255.21 1:08:14 185.96
37894 NaN 2.0 0.0 41.232 38.0 39.0 -37.657406 142.777301 -38.622040 148.366529 500.901 2018-07-02 08:59:29 455.14 16:34:37 139.79
37897 NaN 1.0 0.0 8.865 9.0 2.0 -38.839254 145.226776 -37.695101 148.251214 293.394 2018-03-11 12:18:21 290.70 17:09:02 88.71
37898 NaN 3.0 1.0 27.153 39.0 16.0 -38.446310 148.292498 -36.739777 143.604529 454.968 2018-07-23 08:29:19 366.09 14:35:24 188.49

37869 rows × 16 columns

任务12:填补 'Post Type'的空值

提示:可能与id有关

data_1[data_1['Post Type'].isnull()]

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
2179 ID5742860733 1.0 NaN 27.126 22.0 34.0 -36.679647 144.463364 -38.469625 143.202877 228.181 2018-02-28 14:01:50 227.59 17:49:25 114.27
4212 ID1710452907 2.0 NaN 37.466 22.0 21.0 -36.695380 144.474875 -37.357030 143.508873 113.113 2018-07-28 11:30:33 107.86 13:18:24 103.59
4219 ID1787036788 1.0 NaN 30.877 3.0 6.0 -38.311620 145.478355 -36.832092 145.600747 165.050 2018-07-18 11:58:01 166.50 14:44:31 75.01
4253 ID1680028038 2.0 NaN 13.860 22.0 19.0 -36.837363 144.515369 -37.089526 144.531402 28.106 2018-06-17 04:43:00 31.74 5:14:44 120.43
6299 ID1377767619 1.0 NaN 45.416 23.0 13.0 -37.720326 146.008038 -38.222304 147.498125 142.197 2018-01-13 09:36:02 144.38 12:00:24 69.85
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
26778 ID1571528676 1.0 NaN 15.062 33.0 9.0 -38.965718 144.650812 -38.962812 145.292314 55.525 2018-04-28 21:13:25 60.51 22:13:55 89.97
28691 ID5883461461 1.0 NaN 40.325 4.0 29.0 -36.542055 143.024959 -37.232887 142.980514 77.003 2018-05-01 04:18:09 81.29 5:39:26 124.92
30861 ID1971183564 3.0 NaN 42.700 19.0 4.0 -37.120747 144.287009 -36.762751 143.078632 114.656 2018-02-25 16:19:23 97.43 17:56:48 144.33
32921 ID5291598933 2.0 NaN 27.189 2.0 7.0 -37.564903 148.097962 -38.797131 143.873724 394.211 2018-01-09 12:39:15 359.59 18:38:50 165.54
36991 ID1484986030 1.0 NaN 27.517 37.0 36.0 -38.938504 147.755657 -37.060130 144.993660 320.003 2018-01-07 20:24:23 316.45 1:40:49 99.58

20 rows × 16 columns

#用投递速度来区分
def speed(df):
    speed = df['Journey Distance']/df['Travel Time']
    return round(speed,2)

data_1.loc[data_1['Post Type']==0].apply(speed,axis=1)

0 1.06
1 0.88
2 1.09
3 1.10
4 1.04
...
37837 0.99
37839 1.08
37841 1.22
37842 1.24
37889 0.99
Length: 26529, dtype: float64

#用投递价格来区分
def price(df):
    price = df['Delivery Fare']/df['Package Weight']
    return round(price,2)
data_1.loc[data_1['Post Type']==0,['Delivery Fare','Package Weight']].apply(price,axis=1)

0 4.58
1 3.81
2 19.53
3 9.18
4 13.78
...
37837 13.20
37839 7.10
37841 4.00
37842 12.92
37889 2.03
Length: 26529, dtype: float64

#ID以5还是1开头为区别
data_1.loc[data_1['Post Type'].isnull(),'Post Type'] = data_1.loc[data_1['Post Type'].isnull(),'Id'].apply(lambda s:s[2]=='1')

任务13:修复 'Origin Longitude'与 'Origin Latitude' 列中错误的值

data_1['Origin Longitude'].describe()
#有负值

count 37844.000000
mean 145.423081
std 6.929107
min -148.337157
25% 143.964265
50% 145.424189
75% 147.171954
max 148.450576
Name: Origin Longitude, dtype: float64

def fix_Longitude_Latitude(flt):
    if flt<0:
        return -flt
    else:
        return flt
data_1['Origin Longitude'].apply(fix_Longitude_Latitude).describe()

count 37844.000000
mean 145.577375
std 1.764044
min 142.769991
25% 143.966143
50% 145.426161
75% 147.172965
max 148.450576
Name: Origin Longitude, dtype: float64

data_1['Origin Longitude'] = data_1['Origin Longitude'].apply(fix_Longitude_Latitude)

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.

data_1['Origin Latitude'].describe()

count 37844.000000
mean -37.728738
std 1.900393
min -39.006941
25% -38.442905
50% -37.707244
75% -37.094433
max 38.986998
Name: Origin Latitude, dtype: float64

data_1['Origin Latitude'] = data_1['Origin Latitude'].apply(fix_Longitude_Latitude)

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.

任务14:修复 'Destination Latitude' 与 'Destination Longitude'列中错误的值

data_1['Destination Latitude'] = data_1['Destination Latitude'].apply(fix_Longitude_Latitude)

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.

data_1['Destination Longitude'] = data_1['Destination Longitude'].apply(fix_Longitude_Latitude)

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.

任务 15:填补'Origin Region'和‘Destination Region’中的空值

data_1[data_1['Origin Region'].isnull()]

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
2080 ID1634835235 1.0 0 31.871 NaN 22.0 37.300712 143.610855 36.794684 144.280083 81.904 2018-06-09 03:21:49 86.04 4:47:51 95.70
4442 ID1321659107 2.0 0 12.991 NaN 10.0 38.180248 146.775985 37.396662 147.313624 99.224 2018-04-12 11:25:54 95.42 13:01:19 98.48
8220 ID1522212019 1.0 0 25.282 NaN 23.0 36.756457 148.437792 37.665363 145.818608 253.275 2018-03-27 17:56:31 251.88 22:08:23 94.56
8228 ID1946204979 2.0 0 12.779 NaN 25.0 37.391203 148.116452 36.520786 147.557258 108.914 2018-07-08 10:57:12 104.10 12:41:17 102.49
15023 ID1891262263 1.0 0 36.466 NaN 7.0 37.537619 145.202427 38.723869 143.685908 187.265 2018-03-13 07:21:08 188.00 10:29:08 72.78
20513 ID1490968406 1.0 0 42.648 NaN 9.0 36.654443 143.691689 38.849982 145.479151 290.643 2018-02-16 18:46:02 288.04 23:34:04 99.81
20515 ID5148310393 1.0 1 44.505 NaN 7.0 38.973121 142.989422 38.823172 143.990199 88.293 2018-04-18 23:35:27 92.22 1:07:40 119.24
28710 ID5941350307 1.0 1 17.232 NaN 32.0 37.616289 146.052853 36.696768 148.280560 222.563 2018-02-22 02:17:46 222.16 5:59:55 125.46
34853 ID5234294750 2.0 1 29.063 NaN 7.0 37.048542 142.843945 38.852320 143.963431 223.542 2018-07-14 13:22:02 206.75 16:48:47 151.47
36904 ID1724943602 1.0 0 31.774 NaN 31.0 36.736858 143.688416 37.548149 143.885770 91.993 2018-02-12 14:31:24 95.80 16:07:11 81.57
data_1.loc[data_1['Origin Region'].notnull()]

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
0 ID1645282128 2.0 0 21.686 19.0 38.0 37.089338 144.429529 37.639134 142.891391 149.212 2018-01-16 09:38:17 140.19 11:58:28 99.25
1 ID1697620764 NaN 0 39.075 15.0 15.0 38.481935 146.009567 38.585528 146.199827 20.185 2018-02-10 04:28:17 22.84 4:51:07 149.04
2 ID1543933503 2.0 0 7.243 33.0 28.0 38.754167 144.509664 38.242224 147.855342 296.975 2018-05-05 01:38:03 272.52 6:10:34 141.48
3 ID1756517608 2.0 0 13.383 10.0 38.0 37.240526 147.568019 37.687178 142.991188 407.396 2018-06-11 11:43:04 371.40 17:54:27 122.82
4 ID1832325834 2.0 0 8.123 1.0 8.0 38.143985 143.798292 38.548315 144.769228 95.974 2018-03-16 14:50:25 92.51 16:22:55 111.97
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37839 ID1879423081 2.0 0 17.081 20.0 14.0 38.860358 148.174855 37.562224 146.443991 209.278 2018-02-15 12:09:15 193.98 15:23:13 121.33
37840 ID5705840841 1.0 1 35.164 29.0 23.0 37.250331 142.837244 37.739285 145.963420 281.403 2018-01-01 00:07:35 279.10 4:46:41 136.58
37841 ID1276239209 3.0 0 36.704 12.0 11.0 36.578568 145.273744 38.305557 146.997297 245.269 2018-01-24 06:48:05 200.54 10:08:37 146.69
37842 ID1432868583 3.0 0 13.195 37.0 21.0 38.755802 147.744770 37.487866 143.585293 390.599 2018-04-07 18:57:52 315.28 0:13:08 170.52
37889 NaN 1.0 0 34.130 8.0 36.0 38.434373 144.730220 37.047801 145.309517 162.554 2018-02-06 06:27:55 164.08 9:11:59 69.44

37834 rows × 16 columns

data_2 = data_1.loc[data_1['Origin Region'].notnull()]
def fill_region(df):
    longitude,latitude = df['Origin Longitude'],df['Origin Latitude']
    distance_0 = 100
    region_out = ''

    for index in data_2.index:
        longitude_ = data_2.loc[index,'Origin Longitude']
        latitude_ = data_2.loc[index,'Origin Latitude']
        region = data_2.loc[index,'Origin Region']
        
        distance = (longitude-longitude_)**2+(latitude_-latitude)**2
        if distance<distance_0 :
            region_out = region
            distance_0 = distance
    return region_out

data_1.loc[data_1['Origin Region'].isnull()].apply(fill_region,axis=1)

2080 21.0
4442 11.0
8220 32.0
8228 2.0
15023 5.0
20513 16.0
20515 24.0
28710 23.0
34853 29.0
36904 16.0
dtype: float64

data_1.loc[data_1['Origin Region'].isnull()]

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
2080 ID1634835235 1.0 0 31.871 NaN 22.0 37.300712 143.610855 36.794684 144.280083 81.904 2018-06-09 03:21:49 86.04 4:47:51 95.70
4442 ID1321659107 2.0 0 12.991 NaN 10.0 38.180248 146.775985 37.396662 147.313624 99.224 2018-04-12 11:25:54 95.42 13:01:19 98.48
8220 ID1522212019 1.0 0 25.282 NaN 23.0 36.756457 148.437792 37.665363 145.818608 253.275 2018-03-27 17:56:31 251.88 22:08:23 94.56
8228 ID1946204979 2.0 0 12.779 NaN 25.0 37.391203 148.116452 36.520786 147.557258 108.914 2018-07-08 10:57:12 104.10 12:41:17 102.49
15023 ID1891262263 1.0 0 36.466 NaN 7.0 37.537619 145.202427 38.723869 143.685908 187.265 2018-03-13 07:21:08 188.00 10:29:08 72.78
20513 ID1490968406 1.0 0 42.648 NaN 9.0 36.654443 143.691689 38.849982 145.479151 290.643 2018-02-16 18:46:02 288.04 23:34:04 99.81
20515 ID5148310393 1.0 1 44.505 NaN 7.0 38.973121 142.989422 38.823172 143.990199 88.293 2018-04-18 23:35:27 92.22 1:07:40 119.24
28710 ID5941350307 1.0 1 17.232 NaN 32.0 37.616289 146.052853 36.696768 148.280560 222.563 2018-02-22 02:17:46 222.16 5:59:55 125.46
34853 ID5234294750 2.0 1 29.063 NaN 7.0 37.048542 142.843945 38.852320 143.963431 223.542 2018-07-14 13:22:02 206.75 16:48:47 151.47
36904 ID1724943602 1.0 0 31.774 NaN 31.0 36.736858 143.688416 37.548149 143.885770 91.993 2018-02-12 14:31:24 95.80 16:07:11 81.57

任务16:找出 'Departure Date'中错误的值

data_1['Departure Date'].sort_values(ascending=False)

31975 2018-30-06
4990 2018-28-06
17911 2018-28-06
28350 2018-28-05
35740 2018-28-05
...
12702 2018-01-01
36755 2018-01-01
12684 2018-01-01
36782 2018-01-01
27230 2018-01-01
Name: Departure Date, Length: 37844, dtype: object

def fix_date(dt):
    split_ = dt.split('-')
    year,month,day = split_[0],split_[1],split_[2]
    if month > '08':
        return year+'-'+day+'-'+month
    else:
        return dt
data_1['Departure Date'] = data_1['Departure Date'].apply(fix_date)

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:8: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

data_1['Departure Date'].sort_values(ascending=False)

17849 2018-07-28
17570 2018-07-28
988 2018-07-28
592 2018-07-28
12772 2018-07-28
...
29092 2018-01-01
5232 2018-01-01
12584 2018-01-01
29041 2018-01-01
5519 2018-01-01
Name: Departure Date, Length: 37844, dtype: object

任务17:输出数据集为‘solution.csv’到当前目录下面

data_1.to_csv('solution.csv')
posted @ 2019-12-06 21:00  Sean_Yang  阅读(500)  评论(0)    收藏  举报