航空公司客户价值聚类分析

航空公司客户价值聚类分析

  • 特征工程
  • K-means聚类
  • RFM模型
  • DBSCAN算法

描述

信息时代的来临使得企业营销焦点从产品中心转变成客户中心。具体地,对不同的客户进行分类管理,给予不同类型的客户制定优化的个性化服务方案,采取不同的营销策略。将有限的营销资源集中于高价值的客户,实现企业利润最大化

  1. 借助航空公司数据,对客户进行分类
  2. 对不同类别的客户进行特征分析,比较不同类别客户的价值
  3. 对不同价值的客户类别进行个性化服务,制定相应的营销策略

思路

image.png

数据

数据集中字段含义

image.png
image.png

数据预处理
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
import sklearn.preprocessing
import sklearn.cluster
air_data_path = "./dataset/air_data.csv"
air_data = pd.read_csv(air_data_path)
air_data.shape
(62988, 44)
air_data.head()
MEMBER_NO FFP_DATE FIRST_FLIGHT_DATE GENDER FFP_TIER WORK_CITY WORK_PROVINCE WORK_COUNTRY AGE LOAD_TIME ... ADD_Point_SUM Eli_Add_Point_Sum L1Y_ELi_Add_Points Points_Sum L1Y_Points_Sum Ration_L1Y_Flight_Count Ration_P1Y_Flight_Count Ration_P1Y_BPS Ration_L1Y_BPS Point_NotFlight
0 54993 2006/11/02 2008/12/24 6 . 北京 CN 31.0 2014/03/31 ... 39992 114452 111100 619760 370211 0.509524 0.490476 0.487221 0.512777 50
1 28065 2007/02/19 2007/08/03 6 NaN 北京 CN 42.0 2014/03/31 ... 12000 53288 53288 415768 238410 0.514286 0.485714 0.489289 0.510708 33
2 55106 2007/02/01 2007/08/30 6 . 北京 CN 40.0 2014/03/31 ... 15491 55202 51711 406361 233798 0.518519 0.481481 0.481467 0.518530 26
3 21189 2008/08/22 2008/08/23 5 Los Angeles CA US 64.0 2014/03/31 ... 0 34890 34890 372204 186100 0.434783 0.565217 0.551722 0.448275 12
4 39546 2009/04/10 2009/04/15 6 贵阳 贵州 CN 48.0 2014/03/31 ... 22704 64969 64969 338813 210365 0.532895 0.467105 0.469054 0.530943 39

5 rows × 44 columns

air_data.dtypes
MEMBER_NO                    int64
FFP_DATE                    object
FIRST_FLIGHT_DATE           object
GENDER                      object
FFP_TIER                     int64
WORK_CITY                   object
WORK_PROVINCE               object
WORK_COUNTRY                object
AGE                        float64
LOAD_TIME                   object
FLIGHT_COUNT                 int64
BP_SUM                       int64
EP_SUM_YR_1                  int64
EP_SUM_YR_2                  int64
SUM_YR_1                   float64
SUM_YR_2                   float64
SEG_KM_SUM                   int64
WEIGHTED_SEG_KM            float64
LAST_FLIGHT_DATE            object
AVG_FLIGHT_COUNT           float64
AVG_BP_SUM                 float64
BEGIN_TO_FIRST               int64
LAST_TO_END                  int64
AVG_INTERVAL               float64
MAX_INTERVAL                 int64
ADD_POINTS_SUM_YR_1          int64
ADD_POINTS_SUM_YR_2          int64
EXCHANGE_COUNT               int64
avg_discount               float64
P1Y_Flight_Count             int64
L1Y_Flight_Count             int64
P1Y_BP_SUM                   int64
L1Y_BP_SUM                   int64
EP_SUM                       int64
ADD_Point_SUM                int64
Eli_Add_Point_Sum            int64
L1Y_ELi_Add_Points           int64
Points_Sum                   int64
L1Y_Points_Sum               int64
Ration_L1Y_Flight_Count    float64
Ration_P1Y_Flight_Count    float64
Ration_P1Y_BPS             float64
Ration_L1Y_BPS             float64
Point_NotFlight              int64
dtype: object
air_data.describe().T
count mean std min 25% 50% 75% max
MEMBER_NO 62988.0 31494.500000 18183.213715 1.00 15747.750000 31494.500000 47241.250000 62988.000000
FFP_TIER 62988.0 4.102162 0.373856 4.00 4.000000 4.000000 4.000000 6.000000
AGE 62568.0 42.476346 9.885915 6.00 35.000000 41.000000 48.000000 110.000000
FLIGHT_COUNT 62988.0 11.839414 14.049471 2.00 3.000000 7.000000 15.000000 213.000000
BP_SUM 62988.0 10925.081254 16339.486151 0.00 2518.000000 5700.000000 12831.000000 505308.000000
EP_SUM_YR_1 62988.0 0.000000 0.000000 0.00 0.000000 0.000000 0.000000 0.000000
EP_SUM_YR_2 62988.0 265.689623 1645.702854 0.00 0.000000 0.000000 0.000000 74460.000000
SUM_YR_1 62437.0 5355.376064 8109.450147 0.00 1003.000000 2800.000000 6574.000000 239560.000000
SUM_YR_2 62850.0 5604.026014 8703.364247 0.00 780.000000 2773.000000 6845.750000 234188.000000
SEG_KM_SUM 62988.0 17123.878691 20960.844623 368.00 4747.000000 9994.000000 21271.250000 580717.000000
WEIGHTED_SEG_KM 62988.0 12777.152439 17578.586695 0.00 3219.045000 6978.255000 15299.632500 558440.140000
AVG_FLIGHT_COUNT 62988.0 1.542154 1.786996 0.25 0.428571 0.875000 1.875000 26.625000
AVG_BP_SUM 62988.0 1421.440249 2083.121324 0.00 336.000000 752.375000 1690.270833 63163.500000
BEGIN_TO_FIRST 62988.0 120.145488 159.572867 0.00 9.000000 50.000000 166.000000 729.000000
LAST_TO_END 62988.0 176.120102 183.822223 1.00 29.000000 108.000000 268.000000 731.000000
AVG_INTERVAL 62988.0 67.749788 77.517866 0.00 23.370370 44.666667 82.000000 728.000000
MAX_INTERVAL 62988.0 166.033895 123.397180 0.00 79.000000 143.000000 228.000000 728.000000
ADD_POINTS_SUM_YR_1 62988.0 540.316965 3956.083455 0.00 0.000000 0.000000 0.000000 600000.000000
ADD_POINTS_SUM_YR_2 62988.0 814.689258 5121.796929 0.00 0.000000 0.000000 0.000000 728282.000000
EXCHANGE_COUNT 62988.0 0.319775 1.136004 0.00 0.000000 0.000000 0.000000 46.000000
avg_discount 62988.0 0.721558 0.185427 0.00 0.611997 0.711856 0.809476 1.500000
P1Y_Flight_Count 62988.0 5.766257 7.210922 0.00 2.000000 3.000000 7.000000 118.000000
L1Y_Flight_Count 62988.0 6.073157 8.175127 0.00 1.000000 3.000000 8.000000 111.000000
P1Y_BP_SUM 62988.0 5366.720550 8537.773021 0.00 946.000000 2692.000000 6485.250000 246197.000000
L1Y_BP_SUM 62988.0 5558.360704 9351.956952 0.00 545.000000 2547.000000 6619.250000 259111.000000
EP_SUM 62988.0 265.689623 1645.702854 0.00 0.000000 0.000000 0.000000 74460.000000
ADD_Point_SUM 62988.0 1355.006223 7868.477000 0.00 0.000000 0.000000 0.000000 984938.000000
Eli_Add_Point_Sum 62988.0 1620.695847 8294.398955 0.00 0.000000 0.000000 345.000000 984938.000000
L1Y_ELi_Add_Points 62988.0 1080.378882 5639.857254 0.00 0.000000 0.000000 0.000000 728282.000000
Points_Sum 62988.0 12545.777100 20507.816700 0.00 2775.000000 6328.500000 14302.500000 985572.000000
L1Y_Points_Sum 62988.0 6638.739585 12601.819863 0.00 700.000000 2860.500000 7500.000000 728282.000000
Ration_L1Y_Flight_Count 62988.0 0.486419 0.319105 0.00 0.250000 0.500000 0.711111 1.000000
Ration_P1Y_Flight_Count 62988.0 0.513581 0.319105 0.00 0.288889 0.500000 0.750000 1.000000
Ration_P1Y_BPS 62988.0 0.522293 0.339632 0.00 0.258150 0.514252 0.815091 0.999989
Ration_L1Y_BPS 62988.0 0.468422 0.338956 0.00 0.167954 0.476747 0.728375 0.999993
Point_NotFlight 62988.0 2.728155 7.364164 0.00 0.000000 0.000000 1.000000 140.000000
##### 判断重复值:是否有重复的会员ID
air_data['MEMBER_NO'].duplicated()
0        False
1        False
2        False
3        False
4        False
         ...  
62983    False
62984    False
62985    False
62986    False
62987    False
Name: MEMBER_NO, Length: 62988, dtype: bool
air_data[air_data['MEMBER_NO'].duplicated()]
MEMBER_NO FFP_DATE FIRST_FLIGHT_DATE GENDER FFP_TIER WORK_CITY WORK_PROVINCE WORK_COUNTRY AGE LOAD_TIME ... ADD_Point_SUM Eli_Add_Point_Sum L1Y_ELi_Add_Points Points_Sum L1Y_Points_Sum Ration_L1Y_Flight_Count Ration_P1Y_Flight_Count Ration_P1Y_BPS Ration_L1Y_BPS Point_NotFlight

0 rows × 44 columns

air_data.isna().any()
MEMBER_NO                  False
FFP_DATE                   False
FIRST_FLIGHT_DATE          False
GENDER                      True
FFP_TIER                   False
WORK_CITY                   True
WORK_PROVINCE               True
WORK_COUNTRY                True
AGE                         True
LOAD_TIME                  False
FLIGHT_COUNT               False
BP_SUM                     False
EP_SUM_YR_1                False
EP_SUM_YR_2                False
SUM_YR_1                    True
SUM_YR_2                    True
SEG_KM_SUM                 False
WEIGHTED_SEG_KM            False
LAST_FLIGHT_DATE           False
AVG_FLIGHT_COUNT           False
AVG_BP_SUM                 False
BEGIN_TO_FIRST             False
LAST_TO_END                False
AVG_INTERVAL               False
MAX_INTERVAL               False
ADD_POINTS_SUM_YR_1        False
ADD_POINTS_SUM_YR_2        False
EXCHANGE_COUNT             False
avg_discount               False
P1Y_Flight_Count           False
L1Y_Flight_Count           False
P1Y_BP_SUM                 False
L1Y_BP_SUM                 False
EP_SUM                     False
ADD_Point_SUM              False
Eli_Add_Point_Sum          False
L1Y_ELi_Add_Points         False
Points_Sum                 False
L1Y_Points_Sum             False
Ration_L1Y_Flight_Count    False
Ration_P1Y_Flight_Count    False
Ration_P1Y_BPS             False
Ration_L1Y_BPS             False
Point_NotFlight            False
dtype: bool
air_data.isnull().any()
MEMBER_NO                  False
FFP_DATE                   False
FIRST_FLIGHT_DATE          False
GENDER                      True
FFP_TIER                   False
WORK_CITY                   True
WORK_PROVINCE               True
WORK_COUNTRY                True
AGE                         True
LOAD_TIME                  False
FLIGHT_COUNT               False
BP_SUM                     False
EP_SUM_YR_1                False
EP_SUM_YR_2                False
SUM_YR_1                    True
SUM_YR_2                    True
SEG_KM_SUM                 False
WEIGHTED_SEG_KM            False
LAST_FLIGHT_DATE           False
AVG_FLIGHT_COUNT           False
AVG_BP_SUM                 False
BEGIN_TO_FIRST             False
LAST_TO_END                False
AVG_INTERVAL               False
MAX_INTERVAL               False
ADD_POINTS_SUM_YR_1        False
ADD_POINTS_SUM_YR_2        False
EXCHANGE_COUNT             False
avg_discount               False
P1Y_Flight_Count           False
L1Y_Flight_Count           False
P1Y_BP_SUM                 False
L1Y_BP_SUM                 False
EP_SUM                     False
ADD_Point_SUM              False
Eli_Add_Point_Sum          False
L1Y_ELi_Add_Points         False
Points_Sum                 False
L1Y_Points_Sum             False
Ration_L1Y_Flight_Count    False
Ration_P1Y_Flight_Count    False
Ration_P1Y_BPS             False
Ration_L1Y_BPS             False
Point_NotFlight            False
dtype: bool
boolean_filter = air_data['SUM_YR_1'].notnull() & air_data['SUM_YR_2'].notnull()
boolean_filter
0         True1         True2         True3         True4         True         ...  62983     True62984     True62985     True62986     True62987    FalseLength: 62988, dtype: bool
air_data = air_data[boolean_filter]
filter1 = air_data['SUM_YR_1'] != 0filter2 = air_data['SUM_YR_2'] != 0
air_data = air_data[filter1 | filter2]
air_data.shape
(62044, 44)

特征工程

RFM模型

对于客户价值分析的一个经典模型是 RFM 模型。

  • Recency: 最近消费时间间隔。
  • Frequency: 客户消费频率。
  • Monetary Value: 客户总消费金额。
变体 - LRFMC 模型
  • Length of Relationship: 客户关系时长,反映可能的活跃时长。
  • Recency: 最近消费时间间隔,反映当前的活跃状态。
  • Frequency: 客户消费频率,反映客户的忠诚度。
  • Mileage: 客户总飞行里程,反映客户对乘机的依赖性。
  • Coefficient of Discount: 客户所享受的平均折扣率,侧面反映客户价值高低。
load_time = datetime.datetime.strptime('2014/03/31','%Y/%m/%d')
load_time
datetime.datetime(2014, 3, 31, 0, 0)
ffp_dates = [datetime.datetime.strptime(ffp_date,'%Y/%m/%d') for ffp_date in air_data['FFP_DATE']]
length_of_relationship  = [(load_time-ffp_date).days for ffp_date in ffp_dates]
air_data['LEN_REL'] = length_of_relationship
移除非重要列, 只保留LRFMC模型所需的属性
features = ['LEN_REL','FLIGHT_COUNT','avg_discount','SEG_KM_SUM','LAST_TO_END']
data = air_data[features]

features = ['L','F','C','M','R']
data.columns = features
data.shape
(62044, 5)
data.head()
L F C M R
0 2706 210 0.961639 580717 1
1 2597 140 1.252314 293678 7
2 2615 135 1.254676 283712 11
3 2047 23 1.090870 281336 97
4 1816 152 0.970658 309928 5
data.describe().T
count mean std min 25% 50% 75% max
L 62044.0 1488.691090 847.880920 365.000000 735.000000 1278.000000 2182.000000 3437.0
F 62044.0 11.971359 14.110619 2.000000 3.000000 7.000000 15.000000 213.0
C 62044.0 0.722180 0.184833 0.136017 0.613085 0.712162 0.809293 1.5
M 62044.0 17321.694749 21052.728111 368.000000 4874.000000 10200.000000 21522.500000 580717.0
R 62044.0 172.532703 181.526164 1.000000 29.000000 105.000000 260.000000 731.0

标准化

让不同属性的取值范围一致,即数据的标准化。标准化方法有极大极小标准化、标准差标准化等方法。

  • 对特征标准化,使得各特征的均值为0、方差为1
((data -data.mean(axis=0)) /data.std(axis=0)).describe().T
count mean std min 25% 50% 75% max
L 62044.0 1.117739e-16 1.0 -1.325294 -0.888911 -0.248491 0.817696 2.297857
F 62044.0 3.664717e-17 1.0 -0.706656 -0.635788 -0.352313 0.214636 14.246621
C 62044.0 4.251071e-16 1.0 -3.171310 -0.590233 -0.054199 0.471304 4.208225
M 62044.0 -5.863547e-17 1.0 -0.805297 -0.591263 -0.338279 0.199537 26.761154
R 62044.0 1.465887e-16 1.0 -0.944948 -0.790700 -0.372027 0.481844 3.076511
ss = sklearn.preprocessing.StandardScaler(with_mean=True,with_std=True)
data = ss.fit_transform(data)
data
array([[ 1.43571897, 14.03412875,  1.29555058, 26.76136996, -0.94495516],       [ 1.30716214,  9.07328567,  2.86819902, 13.1269701 , -0.9119018 ],       [ 1.32839171,  8.71893974,  2.88097321, 12.65358345, -0.88986623],       ...,       [-0.14942206, -0.70666211, -2.68990622, -0.77233818, -0.73561725],       [-1.20618274, -0.70666211, -2.55464809, -0.77984321,  1.6056619 ],       [-0.47965977, -0.70666211, -2.39233833, -0.78668323,  0.60304353]])
data = pd.DataFrame(data,columns=features)
data.head()
L F C M R
0 1.435719 14.034129 1.295551 26.761370 -0.944955
1 1.307162 9.073286 2.868199 13.126970 -0.911902
2 1.328392 8.718940 2.880973 12.653583 -0.889866
3 0.658481 0.781591 1.994730 12.540723 -0.416102
4 0.386035 9.923716 1.344346 13.898848 -0.922920
```python data_db = data.copy() ```
data_db.describe().T
count mean std min 25% 50% 75% max
L 62044.0 1.246004e-16 1.000008 -1.325304 -0.888919 -0.248493 0.817703 2.297875
F 62044.0 5.863547e-17 1.000008 -0.706662 -0.635793 -0.352316 0.214637 14.246736
C 62044.0 3.957894e-16 1.000008 -3.171335 -0.590238 -0.054200 0.471308 4.208258
M 62044.0 -1.026121e-16 1.000008 -0.805303 -0.591268 -0.338282 0.199539 26.761370
R 62044.0 4.397660e-17 1.000008 -0.944955 -0.790706 -0.372030 0.481848 3.076536

模型训练与 数据的预测

将客户群体细分为重要保持客户、重要发展客户、重要挽留客户、一般客户、低价值客户五类

K-means聚类算法

  • 目标是把 \(n\) 个观测样本划分成 \(k\) 个群体(cluster),每个群体都有一个中心(mean)。
  • 每个样本仅属于其中一个群体,即与这个样本距离最近的中心的群体。
  • 符号: \(S_{i}\) 是一个群体, \(m_{i}\) 是群体 \(S_{i}\) 里的样本的中心, \(x_{i}\) 是一个样本点。
  • Assignment step (expectation step): 把每个样本分配给距离最近的中心的群体
  • Update step (maximization step): 根据当前的样本及其所属群体,重新计算各群体的中心
num_clusters = 5  # 设置类别为5
km = sklearn.cluster.KMeans(n_clusters=num_clusters, n_jobs=4)  #模型加载
km.fit(data) # 模型训练
/Users/gaozhiyong/Documents/pyenv/pyenv3.6/lib/python3.6/site-packages/sklearn/cluster/_kmeans.py:793: FutureWarning: 'n_jobs' was deprecated in version 0.23 and will be removed in 1.0 (renaming of 0.25).
  " removed in 1.0 (renaming of 0.25).", FutureWarning)

KMeans(n_clusters=5, n_jobs=4)
# 查看模型学习出来的5个群体的中心, 以及5哥群体所包含的样本个数
r1 = pd.Series(km.labels_).value_counts()
r2 = pd.DataFrame(km.cluster_centers_)
r = pd.concat([r2,r1],axis=1)
r.columns = list(data.columns) + ['counts']
r
L F C M R counts
0 0.482004 2.478716 0.298630 2.420403 -0.798959 5338
1 1.155203 -0.091881 -0.150515 -0.099938 -0.373781 15858
2 0.110721 -0.189617 2.353276 -0.185116 -0.015167 3684
3 -0.700396 -0.164828 -0.234397 -0.165888 -0.410842 24970
4 -0.315083 -0.574115 -0.162570 -0.537185 1.684579 12194
# 查看模型对每个样本预测的群体标签
km.labels_
array([0, 0, 0, ..., 3, 4, 4], dtype=int32)

尝试使用RFM模型

data_rfm = data[['R','F','M']]
data_rfm.head()
R F M
0 -0.944955 14.034129 26.761370
1 -0.911902 9.073286 13.126970
2 -0.889866 8.718940 12.653583
3 -0.416102 0.781591 12.540723
4 -0.922920 9.923716 13.898848
km.fit(data_rfm) # 模型对 只包含rfm数据集训练
/Users/gaozhiyong/Documents/pyenv/pyenv3.6/lib/python3.6/site-packages/sklearn/cluster/_kmeans.py:793: FutureWarning: 'n_jobs' was deprecated in version 0.23 and will be removed in 1.0 (renaming of 0.25).
  " removed in 1.0 (renaming of 0.25).", FutureWarning)

KMeans(n_clusters=5, n_jobs=4)
km.labels_
array([3, 3, 3, ..., 2, 1, 2], dtype=int32)
r1 = pd.Series(km.labels_).value_counts()
r2 = pd.DataFrame(km.cluster_centers_)

rr = pd.concat([r2,r1],axis=1)
rr = pd.DataFrame(ss.fit_transform(rr))
rr.columns = list(data_rfm.columns) + ['counts']
rr
R F M counts
0 -0.475915 -0.389200 -0.395668 0.146242
1 1.958565 -0.918959 -0.893438 0.118661
2 -0.129480 -0.846644 -0.841995 1.712033
3 -0.727717 1.772255 1.795436 -1.187639
4 -0.625453 0.382548 0.335664 -0.789296

分析与决策

使用雷达图对模型学习出的5个群体特征进行可视化分析

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Circle,RegularPolygon
from matplotlib.path import Path
from matplotlib.projections.polar import PolarAxes
from matplotlib.projections import register_projection
from matplotlib.spines import Spine
from matplotlib.transforms import Affine2D
def radar_factory(num_vars,frame='circle'):
    # 计算得到 evenly-spaced axis angles
    theta = np.linspace(0,2*np.pi, num_vars, endpoint=False)
    
    class RadarAxes(PolarAxes):
        name= 'radar'
        # 使用1条线段连接指定点
        RESOLUTION = 1
        
        def __init__(self,*args,**kwargs):
            super().__init__(*args,**kwargs)
            # 旋转绘图,使第一个轴位于顶部
            self.set_theta_zero_location('N')
            
        def fill(self, *args, closed=True, **kwargs):
            """覆盖填充,以便默认情况下关闭该行"""
            return super().fill(closed=closed, *args, **kwargs)

        def plot(self, *args, **kwargs):
            """覆盖填充,以便默认情况下关闭该行"""
            lines = super().plot(*args, **kwargs)
            for line in lines:
                self._close_line(line)
                
        def _close_line(self, line):
            x, y = line.get_data()
            # FIXME: x[0], y[0] 处的标记加倍
            if x[0] != x[-1]:
                x = np.concatenate((x, [x[0]]))
                y = np.concatenate((y, [y[0]]))
                line.set_data(x, y)

        def set_varlabels(self, labels):
            self.set_thetagrids(np.degrees(theta), labels)

        def _gen_axes_patch(self):
            # 轴必须以(0.5,0.5)为中心并且半径为0.5
            # 在轴坐标中。
            if frame == 'circle':
                return Circle((0.5, 0.5), 0.5)
            elif frame == 'polygon':
                return RegularPolygon((0.5, 0.5), num_vars,
                                      radius=.5, edgecolor="k")
            else:
                raise ValueError("unknown value for 'frame': %s" % frame)
        
        def _gen_axes_spines(self):
            if frame == 'circle':
                return super()._gen_axes_spines()
            elif frame == 'polygon':
                # spine_type 必须是'left'/'right'/'top'/'bottom'/'circle'.
                spine = Spine(axes=self,
                              spine_type='circle',
                              path=Path.unit_regular_polygon(num_vars))
                # unit_regular_polygon 给出以1为中心的半径为1的多边形
                #(0,0),但我们希望以(0.5,
                #   0.5)的坐标轴。
                spine.set_transform(Affine2D().scale(.5).translate(.5, .5)
                                    + self.transAxes)
                return {'polar': spine}
            else:
                raise ValueError("unknown value for 'frame': %s" % frame)
    register_projection(RadarAxes)
    return theta

LCRFM模型作图

N = num_clusters
theta = radar_factory(N, frame='polygon')

data = r.to_numpy()
fig,ax = plt.subplots(figsize=(5,5), nrows = 1, ncols=1, subplot_kw=dict(projection='radar'))

fig.subplots_adjust(wspace=0.25,hspace=0.20,top=0.85,bottom=0.05)

# 去掉最后一列
case_data = data[:,:-1]
# 设置纵坐标不可见
ax.get_yaxis().set_visible(False)

# 图片标题
title = "Radar Chart for Different Means"
ax.set_title(title, weight='bold', size='medium', position=(0.5, 1.1),
             horizontalalignment='center', verticalalignment='center')
for d in case_data:
    # 画边
    ax.plot(theta, d)
    # 填充颜色
    ax.fill(theta, d, alpha=0.05)
# 设置纵坐标名称
ax.set_varlabels(features)

# 添加图例
labels = ["CustomerCluster_" + str(i) for i in range(1,6)]
legend = ax.legend(labels, loc=(0.9, .75), labelspacing=0.1)

plt.show()


png

RFM模型作图

theta = radar_factory(3, frame='polygon')

data = rr.to_numpy()

fig, ax = plt.subplots(figsize=(5, 5), nrows=1, ncols=1,
                         subplot_kw=dict(projection='radar'))
fig.subplots_adjust(wspace=0.25, hspace=0.20, top=0.85, bottom=0.05)

# 去掉最后一列
case_data = data[:, :-1]
# 设置纵坐标不可见
ax.get_yaxis().set_visible(False)
# 图片标题
title = "Radar Chart for Different Means"
ax.set_title(title, weight='bold', size='medium', position=(0.5, 1.1),
             horizontalalignment='center', verticalalignment='center')
for d in case_data:
    # 画边
    ax.plot(theta, d)
    # 填充颜色
    ax.fill(theta, d, alpha=0.05)
# 设置纵坐标名称
ax.set_varlabels(['R','F','M'])

# 添加图例
labels = ["CustomerCluster_" + str(i) for i in range(1,6)]
legend = ax.legend(labels, loc=(0.9, .75), labelspacing=0.1)

plt.show()

png

DBSCAN模型对LCRFM特征进行计算

from sklearn.cluster import DBSCAN

# Kagging debug
db = DBSCAN(eps=10,min_samples=2).fit(data_db.sample(10000))

DBSCAN_labels = db.labels_
DBSCAN_labels
array([0, 0, 0, ..., 0, 0, 0])

根据LCRFM结果进行分析

应实际业务对聚类结果进行分值离散转化,对应1-5分,其中属性值越大,分数越高:
image.png

  1. 重要保持客户

平均折扣率高(C↑),最近有乘机记录(R↓),乘机次数高(F↑)或里程高(M↑):
这类客户机票票价高,不在意机票折扣,经常乘机,是最理想的客户类型。
公司应优先将资源投放到他们身上,维持这类客户的忠诚度。

  1. 重要发展客户

平均折扣率高(C↑),最近有乘机记录(R↓),乘机次数低(F↓)或里程低(M↓):
这类客户机票票价高,不在意机票折扣,最近有乘机记录,但总里程低,具有很大的发展潜力。
公司应加强这类客户的满意度,使他们逐渐成为忠诚客户。

  1. 重要挽留客户

平均折扣率高(C↑),乘机次数高(F↑)或里程高(M↑),最近无乘机记录(R↑):
这类客户总里程高,但较长时间没有乘机,可能处于流失状态。
公司应加强与这类客户的互动,召回用户,延长客户的生命周期。

  1. 一般客户

平均折扣率低(C↓),最近无乘机记录(R↑),乘机次数高(F↓)或里程高(M↓),入会时间短(L↓):
这类客户机票票价低,经常买折扣机票,最近无乘机记录,可能是趁着折扣而选择购买,对品牌无忠诚度。
公司需要在资源支持的情况下强化对这类客户的联系。

  1. 低价值客户

平均折扣率低(C↓),最近无乘机记录(R↑),乘机次数高(F↓)或里程高(M↓),入会时间短(L↓):
这类客户与一般客户类似,机票票价低,经常买折扣机票,最近无乘机记录,可能是趁着折扣而选择购买,对品牌无忠诚度。

结果分析

  • 群体1的L属性最大

  • 群体2的L、C属性最小

  • 群体3的C属性上最大

  • 群体4的M、F属性属性最大,R属性最小

  • 群体5的R属性最大,F、M属性最小

  • 其中每项指标的实际业务意义为:

    • L:加入会员的时长。越大代表会员资历越久
    • R:最近一次乘机时间。越大代表越久没乘机
    • F:乘机次数。越大代表乘机次数越多
    • M:飞行总里程。越大代表总里程越多
    • C:平均折扣率。越大代表折扣越弱,0表示0折免费机票,10代表无折机票

重要保持客户:客户群4

重要发展客户:客户群3

重要挽留客户:客户群1

一般客户:客户群2

低价值客户:客户群5

决策

  • 重要发展客户、重要保持客户、重要挽留客户这三类客户其实也对应着客户生命周期中的发展期、稳定器、衰退期三个时期。
  • 从客户生命周期的角度讲,也应重点投入资源召回衰退期的客户。
  • 一般而言,数据分析最终的目的是针对分析结果提出并开展一系列的运营/营销策略,以期帮助企业发展。在本实例中,运营策略有三个方向:
    • 提高活跃度:提高一般客户、低价值客户的活跃度。将其转化为优质客户
    • 提高留存率:与重要挽留客户互动,提高这部分用户的留存率
    • 提高付费率:维系重要保持客户、重要发展客户的忠诚度,保持企业良好收入
    • 每个方向对应不同的策略,如会员升级、积分兑换、交叉销售、发放折扣券等手段
posted @ 2022-07-10 12:43  OCEANEYES.GZY  阅读(350)  评论(0编辑  收藏  举报