【心跳信号分类预测】Datawhale打卡- Task 2 数据分析

【心跳信号分类预测】Datawhale打卡- Task 2 数据分析

教程里的数据分析基本没有意义,核心数据列为heartbeat_signals,这个才是重点,并且需要考虑到该列的时序意义。

尝试从:

  • 分解heartbeat_signals的所有列
  • 绘制前10个的心电图的plot图
  • 绘制label为1的分别的10个plot图
import pandas as pd
win_file_path = 'E:\\competition-data\\016_heartbeat_signals\\'
train = pd.read_csv(win_file_path+'train.csv')
test = pd.read_csv(win_file_path+'testA.csv')

输出行列信息

print('train.shape', train.shape)
print('test.shape', test.shape)
train.shape (100000, 3)
test.shape (20000, 2)
train.head(1)


id heartbeat_signals label
0 0 0.9912297987616655,0.9435330436439665,0.764677... 0.0

判断数据缺失和异常 (都没有)

data.isnull().sum()——查看每列的存在nan情况

train.isnull().sum()
id                   0
heartbeat_signals    0
label                0
dtype: int64
test.isnull().sum()
id                   0
heartbeat_signals    0
dtype: int64

了解预测值的分布

train['label'].describe()
count    100000.000000
mean          0.856960
std           1.217084
min           0.000000
25%           0.000000
50%           0.000000
75%           2.000000
max           3.000000
Name: label, dtype: float64
train['label'].value_counts()
0.0    64327
3.0    17912
2.0    14199
1.0     3562
Name: label, dtype: int64

2.3.7 用pandas_profiling生成数据报告

import pandas_profiling
pfr = pandas_profiling.ProfileReport(data_train)
pfr.to_file("./example.html")

这里完全不适用, pandas_profiling, 至少目前来说.

尝试 独立的数据分析:

  • 分解heartbeat_signals的所有列
  • 绘制前5个的心电图的plot图
  • 绘制label为0-3的分别的5个plot重叠图

分解heartbeat_signals的所有列

train['heartbeat_signals'] = train['heartbeat_signals'].astype('string')
x = train['heartbeat_signals'].str.split(",", expand=True)
x
0 1 2 3 4 5 6 7 8 9 ... 195 196 197 198 199 200 201 202 203 204
0 0.9912297987616655 0.9435330436439665 0.7646772997256593 0.6185708990212999 0.3796321642826237 0.19082233510621885 0.040237131594430715 0.02599520771717858 0.03170886048677242 0.06552357497104398 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.9714822034884503 0.9289687459588268 0.5729328050711678 0.1784566262750076 0.1229615224365985 0.13236021729815928 0.09439236984499814 0.08957535516351411 0.030480606866741047 0.04049936195430977 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 1.0 0.9591487564065292 0.7013782792997189 0.23177753487886463 0.0 0.08069805776387916 0.12837603937503544 0.18744837555079963 0.28082571505275855 0.3282610568488903 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.9757952826275774 0.9340884687738161 0.6596366611990001 0.2499208267606008 0.23711575621286213 0.28144491730834825 0.2499208267606008 0.2499208267606008 0.24139674778512604 0.2306703464848836 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.055816398940721094 0.26129357194994196 0.35984696254197834 0.43314263962884686 0.45369772898632504 0.49900406742109477 0.5427959768500487 0.6169044962835193 0.6766958323316207 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
99995 1.0 0.677705342021188 0.22239242747868546 0.2571578307224994 0.20469042415279454 0.05466497618736314 0.026152286890497062 0.11818142707296006 0.24483757081121627 0.3289485158861968 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
99996 0.9268571578157265 0.9063471198026871 0.6369932212888393 0.41503751002775946 0.37474480119929776 0.3825812845814957 0.35894293360916163 0.34135861850914284 0.3365254578264915 0.3170292884548231 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
99997 0.9258351628306013 0.5873839035878395 0.6332261741951388 0.6323533645350808 0.6392827243034813 0.6142923239940205 0.5991551019747257 0.5176324324889339 0.4038033525475481 0.2531748788594435 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
99998 1.0 0.9947621698382489 0.8297017704865509 0.45819277171637834 0.26416169623741237 0.24022845026183584 0.21376575735540573 0.18929103849637752 0.20381573166587716 0.21086610220048516 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
99999 0.9259994004527861 0.916476635326053 0.4042900774399834 0.0 0.2630344094167657 0.3854310437765884 0.3610665021846972 0.33270794046870034 0.33985000288462475 0.3504972538285509 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

100000 rows × 205 columns

type(x)

pandas.core.frame.DataFrame

绘制5个的心电图的plot图 (train)

from matplotlib import pyplot as plt
import numpy as np
import sys


for i in range(0, 5):
    plt.figure(12)
    val = [float(x) for x in np.array( x.iloc[i,:])]
    plt.subplot(5,1,i+1)
    my_y_ticks = np.arange(-1.0, 1.0, 0.1)
    plt.yticks(my_y_ticks)
    plt.plot(val)

plt.show()

绘制label为0-3的分别的5个plot重叠图 (train)

print('')
for _label in range(0,4):
    for random_state in [2020]:
        spl = train[train['label']==_label].sample(n=5, random_state=random_state)
        sample = x[x.index.isin(spl.index)]
        for i in range(spl.shape[0]):
            float_val =[float(x) for x in np.array(sample.iloc[i, :])]
            plt.plot(float_val)

        plt.title("_label=" + str(_label)+",random_state="+str(random_state))
        plt.yticks(np.arange(0, 1.0, 0.1))
        plt.xticks(np.arange(0, 250, 25))
        plt.figure(figsize=(32, 8))
        plt.show()

posted @ 2021-03-17 22:11  山枫叶纷飞  阅读(405)  评论(0编辑  收藏  举报