用python分析短信数据

原始数据片段展示:

来电,2017/1/5 上午11:55,95599,【中国农业银行】您尾号9672的农行账户于01051154分完成一笔支付宝交易,金额为-18.00,余额3905.35。,
来电,2017/1/5 下午12:10,95599,【中国农业银行】您尾号9672的农行账户于01051210分完成一笔现支交易,金额为-200.00,余额3705.35。,
来电,2017/1/5 下午12:35,95599,【中国农业银行】您尾号9672的农行账户于01051235分完成一笔支付宝交易,金额为-50.00,余额3650.35。,
来电,2017/1/5 下午1:47,95599,【中国农业银行】您尾号9672的农行账户于01051347分完成一笔支付宝浙交易,金额为-199.00,余额3451.35。,
来电,2017/1/5 下午2:45,95599,【中国农业银行】您尾号9672的农行账户于01051445分完成一笔消费交易,金额为-199.00,余额3252.35。,
来电,2017/1/5 下午4:21,95599,【中国农业银行】您尾号9672的农行账户于01051621分完成一笔支付宝浙交易,金额为-329.00,余额2923.35。,
来电,2017/1/5 下午5:56,95599,【中国农业银行】您尾号9672的农行账户于01051756分完成一笔支付宝交易,金额为-20.00,余额2903.35。,
来电,2017/1/9 上午10:33,106906615500,【京东】还剩最后两天!PLUS会员新年特权,开通立送2000京豆,独享全品类神券,确定要错过? dc.jd.com/auVjQQ 回TD退订,
来电,2017/1/10 下午1:10,106980005618000055,【京东】我是京东配送员:韩富韩,您的订单正在配送途中,请准备收货,联系电话:15005125027。,
来电,2017/1/10 下午3:13,106906615500,【京东】等着放假,忘了您的PLUS账户中还有超过2000待返京豆?现在开通PLUS正式用户即可到账,还可享受高于普通用户10倍的购物回馈,随时京豆拿到手软。另有全年360元运费补贴、专享商品、专属客服等权益。戳 dc.jd.com/XhuKQQ 开通。回TD退订,

(数据来源-手机短信导出CVS格式)

目的

第一阶段的目的:分析基于中国农业银行的短信提醒,基于时间和银行账户余额的一个图表。
二阶段:想办法表现消费原因,消费金额。
三阶段:在处理语言方面可以灵活变动,不是简单地切片处理,而是基于处理自然语言的理解文意

以下是第一阶段的代码。如有问题或建议,欢迎交流!

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun Jul 22 22:13:20 2018

@author: mrzhang
"""

import csv
import os
import matplotlib.pyplot as plt


class DealMessage:

    def __init__(self):
        self.home_path = os.getcwd() # get absolute path
        self.filename = self.home_path + "/message.csv" 

    def get_cvs_list(self):
        ''' get data for cvs '''
        with open(self.filename) as f: # open file
            reader = csv.reader(f)
            list_read = list(reader)
        return list_read

    def get_yinghang_message_list(self):
        ''' del other data likes name, phone and others '''
        total_list = self.get_cvs_list()
        money_list = []
        for each_line in total_list:
            if each_line[2] == '95599':
                del each_line[0] # remove useless data
                del each_line[1]
                del each_line[2]
                each_line_list = each_line[1][37:].split(',')
                each_line_list.insert(0, each_line[0])
                money_list.append(each_line_list) # add to a new List
        return money_list

    def get_type_by_parameter(self, num):
        ''' there are 2 types of data, use len of data to distinguish it '''
        money_list = self.get_yinghang_message_list()
        first_list = []
        for each in money_list:
            if len(each) == num:
                first_list.append(each)
        return first_list

    def deal_time_form(self, messages):
        ''' transform time form like 1995/02/07/02/23 '''
        for each in messages:
            correct_time = each[0].split()
            date = correct_time[0]
            time = correct_time[1]
            time = time[2:]
            shi, feng = time.split(":")
            if time[0:2] == "下":
                shi = int(shi) + 12
            final_time = date + "/" + str(shi) + "/" + feng
            each.insert(0, final_time)

    def choose_message_by_time(self, is_before_0223):
        ''' reduce the difference betwoon different data, deal with time and money at the same time.'''
        if is_before_0223:
            num = 4
            remove_num = 2
        else:
            num = 3
            remove_num = 5
        messages = self.get_type_by_parameter(num)
        for each in messages:
            # deal with time , transform time form like 1995/12/17/02/23 
            correct_time = each[0].split() 
            date = correct_time[0]
            time = correct_time[1]
            time = time[2:]
            shi, feng = time.split(":")
            if time[0:2] == "下": # transform time-form into 24h-form
                shi = int(shi) + 12
            final_time = date + "/" + str(shi) + "/" + feng
            each.insert(0, final_time)
            # deal with money
            money = each[-1][remove_num:][0:-1]
            each.insert(1, money)
        return messages

    def get_x_y(self):
        ''' get money and time  '''
        messages = self.choose_message_by_time(True)+self.choose_message_by_time(False)
        time_list = []
        money_list = []
        for each in messages:
            time_list.append(each[0])
            money_list.append(float(each[1]))
        return time_list[35::3], money_list

    def draw_picture(self):
        ''' draw a picture about money change '''
        x, y = self.get_x_y()
        plt.figure(figsize=(16, 4))  # Create figure object
        plt.plot(y, 'r')  # plot‘s paramter(x,y,color,width)
        plt.xlabel("Time")  
        plt.ylabel("Money") 
        plt.title("money")  
        plt.grid(True) 

        plt.show()  # show picture
        plt.savefig("line.jpg")  # save picture

m = DealMessage() # get a class object
m.draw_picture() # draw picture

程序运行:
结果图

随意转载,欢迎交流!

posted @ 2018-07-23 22:39  2020张念磊要加油  阅读(421)  评论(0编辑  收藏  举报