Hive数据分析
描述:
样表(sales_sample_20170310)字段说明:
day_id 日期编号;
sale_nbr 卖出方代码;
buy_nbr 买入方代码;
cnt 数量
round 金额
卖出方和买入方又分为 3 种类型:
以'C'开头的表示类型为 C,代表“航空公司”,只可以卖出,不可以买入;
以'O'开头的表示类型为 O,代表“代理人”,既可以卖出,也可以买入,并且允许自
己卖给自己(简单来讲:每个“代理人”代码可能对应多个售票点,售票点之前有交换票的
情况,所以体现为自己卖给了自己);
'PAX'表示类型为 PAX,代表“旅客”,只可以买入,不可以卖出。
举例
day_id,sale_nbr,buy_nbr,cnt,round
1,C1,O1001,1,360
卖出方为 C1,类型为 C;买入方为 O1001,类型为 O
day_id,sale_nbr,buy_nbr,cnt,round
1,O100,O100,4,2000
卖出方为 O100,类型为 O;买入方为 O100,类型为 O(即自己卖给自己是允许的)
day_id,sale_nbr,buy_nbr,cnt,round
1,O100,PAX,4,2000
卖出方为 O100,类型为 O;买入方为 PAX,类型为 PAX
问题:
1、数据导入:
要求将样表文件中的(sales_sample_20170310)数据导入 HIVE 数据仓库中。
2、数据清洗:
要求将 day_id 一列中的数值清洗为真实的日期格式,可用字符串表示。
数据 1 对应日期 2021-09-01,依次类推,15 对应日期 2021-09-15
3、数据分析处理:
(1)统计每天各个机场的销售数量和销售金额。
要求的输出字段
day_id,sale_nbr,,cnt,round
日期编号,卖出方代码,数量,金额
(2)统计每天各个代理商的销售数量和销售金额。
要求的输出字段
day_id,sale_nbr,,cnt,round
日期编号,卖出方代码,数量,金额
(3)统计每天各个代理商的销售活跃度。
要求的输出字段
day_id,sale_nbr, sale_number
日期编号,卖出方代码,交易次数(买入或者卖出均算交易次数)
(4)汇总统计 9 月 1 日到 9 月 15 日之间各个代理商的销售利润。
编号,卖出方代码,买入数量,买入金额,卖出数量,卖出金额,销售利润(卖出
金额-买入金额)
(5)设计分析代理商的市场地位根据市场交易次数、交易对象个数、销售机票数量、
销售利润等。(选做题)
4、处理结果入库:
将上述统计分析的结果数据保存到 mySQL 数据库中。
5、数据可视化展示:
利用 Echarts 将上述统计结果以图形化展示的方式展现出来:饼图、柱状图、地图、折
线图等。
解决:
部分sql语句
导入到hive
bin/sqoop import \
--connect jdbc:mysql://Hadoop102:3306/company \
--username root \
--password 1229 \
--table staff \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"
导出
bin/sqoop export \
--connect jdbc:mysql://Hadoop102:3306/company \
--username root \
--password 1229 \
--table test \
--num-mappers 1 \
--export-dir /user/hive/warehouse/data \
--input-fields-terminated-by "\t"
bin/sqoop export \
--connect jdbc:mysql://Hadoop102:3306/company \
--username root \
--password 1229 \
--table test \
--num-mappers 1 \
--export-dir /user/hive/warehouse/try \
--input-fields-terminated-by ","
bin/sqoop export \
--connect jdbc:mysql://Hadoop102:3306/company \
--username root \
--password 1229 \
--table sale \
--num-mappers 1 \
--export-dir /user/hive/warehouse/sales2 \
--input-fields-terminated-by ","
bin/sqoop export \
--connect jdbc:mysql://Hadoop102:3306/company \
--username root \
--password 1229 \
--table jieguo01 \
--num-mappers 1 \
--export-dir /user/hive/warehouse/jieguo01 \
--input-fields-terminated-by ","
create table sales(day_id String,
sale_nbr String,
buy_nbr String,
cnt String,
round String)
ROW format delimited fields terminated by '/t' STORED AS TEXTFILE;
create table sales2(day_id String,
sale_nbr String,
buy_nbr String,
cnt String,
round String)
ROW format delimited fields terminated by ',' STORED AS TEXTFILE;
create table sales3(day_id String,
sale_nbr String,
buy_nbr String,
cnt String,
round String)
ROW format delimited fields terminated by ',' STORED AS TEXTFILE;
load data local inpath '/opt/software/sample.txt' into table sales;
select * from sales limit 10;
修改时间:
insert overwrite table sales3 select date_add('2021-09-00',cast(day_id as int)) as day_id,sale_nbr as sale_nbr,buy_nbr as buy_nbr,cnt as cnt,round as round from sales2;
create table jieguo1(day_id string,
sale_nbr string,
cnt string,
round string)
ROW format delimited fields terminated by ',' STORED AS TEXTFILE;
create table jieguo01(day_id string,
sale_nbr string,
cnt string,
round string)
ROW format delimited fields terminated by ',' STORED AS TEXTFILE;
2.1
insert overwrite table jieguo1 select day_id as day_id ,sale_nbr as sale_nbr,sum(cnt) as cnt ,sum(round) as round from sales2 where sale_nbr like 'C%' group by day_id,sale_nbr;
2.2
create table jieguo02(day_id string,
sale_nbr string,
cnt string,
round string)
ROW format delimited fields terminated by ',' STORED AS TEXTFILE;
建表:
create table jieguo2(day_id string,
sale_nbr string,
cnt string,
round string)
ROW format delimited fields terminated by ',' STORED AS TEXTFILE;
查询:
insert overwrite table jieguo2 select day_id as day_id ,sale_nbr as sale_nbr,sum(cnt) as cnt ,sum(round) as round from sales3 where sale_nbr like 'O%' group by day_id,sale_nbr;
bin/sqoop export \
--connect jdbc:mysql://Hadoop102:3306/company \
--username root \
--password 1229 \
--table jieguo02 \
--num-mappers 1 \
--export-dir /user/hive/warehouse/jieguo02 \
--input-fields-terminated-by ","
2.3
bin/sqoop export \
--connect jdbc:mysql://Hadoop102:3306/company \
--username root \
--password 1229 \
--table jieguo03 \
--num-mappers 1 \
--export-dir /user/hive/warehouse/jieguo03 \
--input-fields-terminated-by ","
create table jieguo03(day_id String,sale_nbr String,sale_number String) ROW format delimited fields terminated by ',';
insert overwrite table jieguo03 select day_id,sale_nbr,COUNT(*) as sale_number from sales3 group by day_id,sale_nbr having sale_nbr like 'O%';
2.4
bin/sqoop export \
--connect jdbc:mysql://Hadoop102:3306/company \
--username root \
--password 1229 \
--table jieguo04 \
--num-mappers 1 \
--export-dir /user/hive/warehouse/daili \
--input-fields-terminated-by ","
create table daiout as select day_id as day_id,sale_nbr as sale_nbr,sum(cnt)as cnt,sum(round)as round from sales3 where sale_nbr like 'O%' group by day_id,sale_nbr;
create table daiin as select day_id as day_id,buy_nbr as buy_nbr,sum(cnt)as cnt,sum(round)as round from sales3 where buy_nbr like 'O%' group by day_id,buy_nbr;
create table daili as select daiin.day_id as day_id,daiout.sale_nbr as sale_nbr,daiin.cnt as incnt,daiin.round as inround,daiout.cnt as outcnt,daiout.round as outround,daiout.round-daiin.round as li from daiin join daiout on (daiin.buy_nbr=daiout.sale_nbr);
echarts实现:
部分代码:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
<link rel="stylesheet" href="./layui/css/layui.css">
</head>
<body>
<script src="echarts.js"></script>
<script src="js/jquery.min.js"></script>
</body>
<button class="layui-btn" onclick="getFirstData()">第一个表</button>
<button class="layui-btn" onclick="getSecoundData()">第二个表</button>
<button class="layui-btn" onclick="getThirdData()">第三个表</button>
<div id="main" style="height:600%;" ></div>
<script>
var hzb=new Array();
var zzb=new Array();
function getFirstData(){
var url = "servlet?method=getFirstData&randnum=" + Math.random();
$.ajax({
type: "get",
url: url,
data: [],
dataType: "json",
success: function(result){
fenli(result)
setChatr(1)
},
error: function(){
alert("错误");
}
});
}
function getSecoundData(){
var url = "servlet?method=getSecoundData&randnum=" + Math.random();
$.ajax({
type: "get",
url: url,
data: [],
dataType: "json",
success: function(result){
fenli(result)
setChatr(2)
},
error: function(){
alert("错误");
}
});
}
function getThirdData(){
var url = "servlet?method=getThirdData&randnum=" + Math.random();
$.ajax({
type: "get",
url: url,
data: [],
dataType: "json",
success: function(result){
fenli(result)
setChatr(3)
},
error: function(){
alert("错误");
}
});
}
function fenli(data){
for(i in data){
hzb.push(data[i].hzb)
zzb.push(data[i].zzb)
}
}
function setChatr(num){
var chartDom = document.getElementById('main');
var myChart = echarts.init(chartDom);
var Title;
if(num==1)Title="金额";
if(num==2)Title="销售金额";
if(num==3)Title="销售活跃度";
var option;
option = {
title: { text: Title },
xAxis: {
type: 'category',
data: hzb,
axisLabel: {
interval: 0,
formatter:function(value)
{
return value.split("").join("\n");
}
},
},
yAxis: {
type: 'value'
},
series: [
{
data: zzb,
type: 'line',
barWidth : 20,
showBackground: true,
backgroundStyle: {
color: 'rgba(180, 180, 180, 0.2)'
}
}
],
dataZoom : [
{
type: 'slider',
show: true,
start: 94,
end: 100,
handleSize: 8
},
{
type: 'inside',
start: 94,
end: 100
},
{
type: 'slider',
show: true,
yAxisIndex: 0,
filterMode: 'empty',
width: 12,
height: '70%',
handleSize: 8,
showDataShadow: false,
left: '93%'
}
]
};
option && myChart.setOption(option);
}
</script>
</html>
servlet:
package classes; import javax.servlet.*; import javax.servlet.http.*; import javax.servlet.annotation.*; import java.io.IOException; import java.io.PrintWriter; import java.sql.SQLException; import java.util.List; import net.sf.json.JSONArray; import net.sf.json.JSONObject; @WebServlet(name = "servlet", value = "/servlet") public class servlet extends HttpServlet { Dao dao=new Dao(); public void setdata(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setCharacterEncoding("utf-8"); response.setContentType("text/html;charset=UTF-8"); System.out.println("调用了方法"); JSONArray json=new JSONArray(); for(int i=0;i<5;i++){ JSONObject ob=new JSONObject(); ob.put("name","水果"); ob.put("num",i*10); json.add(ob); } System.out.println(json.toString()); PrintWriter out = response.getWriter(); out.write(json.toString()); } public void getAlldata(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setCharacterEncoding("utf-8"); response.setContentType("text/html;charset=UTF-8"); List<FourBean> list=dao.FourAllData(); JSONArray json=new JSONArray(); for(int i=0;i<list.size();i++){ JSONObject ob=new JSONObject(); ob.put("day_id",list.get(i).getDay_id()); ob.put("sale_nbr",list.get(i).getSale_nbr()); ob.put("incnt",list.get(i).getIncnt()); ob.put("inround",list.get(i).getInround()); ob.put("outcnt",list.get(i).getOutcnt()); ob.put("outround",list.get(i).getOutround()); ob.put("li",list.get(i).getLi()); json.add(ob); } PrintWriter out = response.getWriter(); out.write(json.toString()); System.out.println("发送完成"); } public void getFirstData(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setCharacterEncoding("utf-8"); response.setContentType("text/html;charset=UTF-8"); List<FirstBean> list=dao.getFirstData(); JSONArray json=new JSONArray(); for(int i=0;i<list.size();i++){ JSONObject ob=new JSONObject(); ob.put("hzb",list.get(i).getSale_nbr()); ob.put("zzb",list.get(i).getRound()); json.add(ob); } PrintWriter out = response.getWriter(); out.write(json.toString()); } public void getSecoundData(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setCharacterEncoding("utf-8"); response.setContentType("text/html;charset=UTF-8"); List<SecoundBean> list=dao.getSecoundData(); JSONArray json=new JSONArray(); for(int i=0;i<list.size();i++){ JSONObject ob=new JSONObject(); ob.put("hzb",list.get(i).getSale_nbr()); ob.put("zzb",list.get(i).getRound()); json.add(ob); } PrintWriter out = response.getWriter(); out.write(json.toString()); } public void getThirdData(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setCharacterEncoding("utf-8"); response.setContentType("text/html;charset=UTF-8"); List<ThirdBean> list=dao.getThirdData(); JSONArray json=new JSONArray(); for(int i=0;i<list.size();i++){ JSONObject ob=new JSONObject(); ob.put("hzb",list.get(i).getSale_nbr()); ob.put("zzb",list.get(i).getSale_number()); json.add(ob); } PrintWriter out = response.getWriter(); out.write(json.toString()); } @Override protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setCharacterEncoding("utf-8"); String method=request.getParameter("method"); if(method.equals("setdata")){ try { setdata(request,response); } catch (ServletException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } else if(method.equals("getAlldata")){ try { getAlldata(request,response); } catch (ServletException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } else if(method.equals("getFirstData")){ try { getFirstData(request,response); } catch (ServletException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } else if(method.equals("getSecoundData")){ try { getSecoundData(request,response); } catch (ServletException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } else if(method.equals("getThirdData")){ try { getThirdData(request,response); } catch (ServletException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } } @Override protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { doGet(request,response); } }
总结:还不够好,需要继续努力

浙公网安备 33010602011771号