网站用户行为分析

网站用户行为分析

1. 本地数据集上传到数据仓库Hive

  1. 数据集下载与查看
  2. 数据集预处理
  3. 把数据集导入HDFS中
  4. 在Hive上创建数据库

2. Hive数据分析

  1. 给出数据分析需求
  2. 用select语句实现数据分析
  3. 数据分析结果查看与保存 

3. Hive、MySQL、HBase数据互导

 1. 本地数据集上传到数据仓库Hive

第一步,数据下载预处理

 

第二步,将数据上传至hdfs

 

第三步,在hive上创建数据库和表

 

第四步,查看创建表的数据类型与信息

2.2 Hive数据分析

(1)用户行为分析需求:2014-12-11~12号有多少条购买商品的记录

    分析步骤

  • 语句:select count(*) from bigdata_user where visit_date >'2014-12-10' and visit_date <'2014-12-13' and behavior_type='4' limit 10;
  • 结果截图:运行或存为表格后的查询显示

(2)用户行为分析需求:分析每月1-31号购买情况

  • 语句:select day(visit_date) from bigdata_user limit 10;
  • 行为日期
  •  

  • 购买行为的记录数、不同用户数
  • 语句:select count(distinct uid) from bigdata_user where behavior_type='4’;

     

  • 按日期统计记录数、用户数
  • 语句:select count(distinct uid),day(visit_date) from bigdata_user where behavior_type='4' group by day(visit_date) limit 10;

     

  • 语句:select count(*),day(visit_date) from bigdata_user where behavior_type='4' group by day(visit_date) limit 10;

     

  • 保存为表格
  • 语句:create table day_count as select count(*),day(visit_date) from bigdata_user where behavior_type='4' group by day(visit_date);

     

    语句:create table day_uid as select count(distinct uid),day(visit_date) from bigdata_user where behavior_type='4' group by day(visit_date);

  • 12号+购买行为
  • 语句:select * from bigdata_user where behavior_type='4'and visit_date='2014-12-12' limit 10;
  • 按用户编号分组
  • 语句:select uid from bigdata_user where behavior_type='4'and visit_date='2014-12-12' group by uid limit 10;
  • 按用户分组统计
  • 语句:select uid,count(*) from bigdata_user where behavior_type='4'and visit_date='2014-12-12' group by uid limit 10;
  • 12号,购买,4项以上
  • 语句:select uid,count(*) from bigdata_user where behavior_type='4'and visit_date='2014-12-12' group by uid having count(*)>4 limit 10;
  • 语句:select uid,count(*) from bigdata_user where behavior_type='4'and visit_date='2014-12-12' group by uid having count(behavior_type='4')>4 limit 10;
  •  

     

  • 2014-12-12号当天广东购买商品数
  • 语句:select count(*)from bigdata_user where visit_date='2014-12-12' and province='广东';
  • 按省份统计购买数量

  • 语句:select count(*)from bigdata_user group by province;
  •  

     

  • 2014-12-12号当天的商品购买与浏览比例
  • 语句:select count(*)from bigdata_user where visit_date='2014-12-12' and behavior_type='4' limit 10;
  •  

     

  • 用户10001082在2014-12-12号当天活跃度:该用户点击行为占该天所有点击行为的比例

  • 语句:select count(*)from bigdata_user where visit_date='2014-12-12' and uid='10001082';
  •  

     

  • 2014-12-12号当天购买4件商品以上的用户
  • 语句:select uid,count(*)from bigdata_user where behavior_type='4' and visit_date='2014-12-12' group by uid having count(*)>4;
  •  

(3)自定义需求:

  12月10号买了超过四种商品的用户id

  语句:select uid from bigdata_user where behavior_type='4' and visit_date='2014-12-10' group by uid having count(behavior_type='4')>5;

 

10号,购买,3项以上

 

 语句:select uid,count(*) from bigdata_user where behavior_type='4'and visit_date='2014-12-10' group by uid having count(*)>3 limit 10;

 

通国际当天购买商品钟类为3的天数,并保存到表中

语句:create table day_count_3 as select count(*),day(visit_date) from bigdata_user where behavior_type='3' group by day(visit_date);

 

 3. Hive、MySQL、HBase数据互导

创建临时表user_action

create table dblab.user_action(id STRING,uid STRING, item_id STRING, behavior_type STRING, item_category STRING, visit_date DATE, province STRING) COMMENT 'Welcome to XMU dblab! ' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;

 

 

 

 

dblab.bigdata_user数据插入到dblab.user_action表中

INSERT OVERWRITE TABLE dblab.user_action select * from dblab.bigdata_user;

 

 

 

select * from user_action limit 10;

 

 

 

使用sqoop将数据从hive导入mysql

 

 

 

 

 

show variables like "char%";

 

 

 

 

 

 

 

MySQL的数据库dblab中创建一个新表user_action,并设置其编码为utf-8

CREATE TABLE `dblab`.`user_action` (`id` varchar(50),`uid` varchar(50),`item_id` varchar(50),`behavior_type` varchar(10),`item_category` varchar(50), `visit_date` DATE,`province` varchar(20)) ENGINE=InnoDB DEFAULT CHARSET=utf8;

 

导入数据

sqoop export --connect jdbc:mysql://localhost:3306/dblab --username root --password hadoop --table user_action --export-dir '/user/hive/warehouse/dblab.db/user_action' --fields-terminated-by '\t';

 

 

 

 

select * from user_action limit 10;

 

 

 

 

创建表user_action

 

 

 

 

导入数据

sqoop import --connect jdbc:mysql://localhost:3306/dblab --username root --password hadoop --table user_action --hbase-table user_action --column-family f1 --hbase-row-key id --hbase-create-table -m 1

 

posted @ 2020-12-29 23:00  Fk020306  阅读(376)  评论(0编辑  收藏  举报