Hive数据分析
描述:
样表(sales_sample_20170310)字段说明:
day_id 日期编号;
sale_nbr 卖出方代码;
buy_nbr 买入方代码;
cnt 数量
round 金额
卖出方和买入方又分为 3 种类型:
以'C'开头的表示类型为 C,代表“航空公司”,只可以卖出,不可以买入;
以'O'开头的表示类型为 O,代表“代理人”,既可以卖出,也可以买入,并且允许自
己卖给自己(简单来讲:每个“代理人”代码可能对应多个售票点,售票点之前有交换票的
情况,所以体现为自己卖给了自己);
'PAX'表示类型为 PAX,代表“旅客”,只可以买入,不可以卖出。
举例
day_id,sale_nbr,buy_nbr,cnt,round
1,C1,O1001,1,360
卖出方为 C1,类型为 C;买入方为 O1001,类型为 O
day_id,sale_nbr,buy_nbr,cnt,round
1,O100,O100,4,2000
卖出方为 O100,类型为 O;买入方为 O100,类型为 O(即自己卖给自己是允许的)
day_id,sale_nbr,buy_nbr,cnt,round
1,O100,PAX,4,2000
卖出方为 O100,类型为 O;买入方为 PAX,类型为 PAX
问题:
1、数据导入:
要求将样表文件中的(sales_sample_20170310)数据导入 HIVE 数据仓库中。
2、数据清洗:
要求将 day_id 一列中的数值清洗为真实的日期格式,可用字符串表示。
数据 1 对应日期 2021-09-01,依次类推,15 对应日期 2021-09-15
3、数据分析处理:
(1)统计每天各个机场的销售数量和销售金额。
要求的输出字段
day_id,sale_nbr,,cnt,round
日期编号,卖出方代码,数量,金额
(2)统计每天各个代理商的销售数量和销售金额。
要求的输出字段
day_id,sale_nbr,,cnt,round
日期编号,卖出方代码,数量,金额
(3)统计每天各个代理商的销售活跃度。
要求的输出字段
day_id,sale_nbr, sale_number
日期编号,卖出方代码,交易次数(买入或者卖出均算交易次数)
(4)汇总统计 9 月 1 日到 9 月 15 日之间各个代理商的销售利润。
编号,卖出方代码,买入数量,买入金额,卖出数量,卖出金额,销售利润(卖出
金额-买入金额)
(5)设计分析代理商的市场地位根据市场交易次数、交易对象个数、销售机票数量、
销售利润等。(选做题)
4、处理结果入库:
将上述统计分析的结果数据保存到 mySQL 数据库中。
5、数据可视化展示:
利用 Echarts 将上述统计结果以图形化展示的方式展现出来:饼图、柱状图、地图、折
线图等。
解决:
部分sql语句
导入到hive bin/sqoop import \ --connect jdbc:mysql://Hadoop102:3306/company \ --username root \ --password 1229 \ --table staff \ --target-dir /user/company \ --delete-target-dir \ --num-mappers 1 \ --fields-terminated-by "\t" 导出 bin/sqoop export \ --connect jdbc:mysql://Hadoop102:3306/company \ --username root \ --password 1229 \ --table test \ --num-mappers 1 \ --export-dir /user/hive/warehouse/data \ --input-fields-terminated-by "\t" bin/sqoop export \ --connect jdbc:mysql://Hadoop102:3306/company \ --username root \ --password 1229 \ --table test \ --num-mappers 1 \ --export-dir /user/hive/warehouse/try \ --input-fields-terminated-by "," bin/sqoop export \ --connect jdbc:mysql://Hadoop102:3306/company \ --username root \ --password 1229 \ --table sale \ --num-mappers 1 \ --export-dir /user/hive/warehouse/sales2 \ --input-fields-terminated-by "," bin/sqoop export \ --connect jdbc:mysql://Hadoop102:3306/company \ --username root \ --password 1229 \ --table jieguo01 \ --num-mappers 1 \ --export-dir /user/hive/warehouse/jieguo01 \ --input-fields-terminated-by "," create table sales(day_id String, sale_nbr String, buy_nbr String, cnt String, round String) ROW format delimited fields terminated by '/t' STORED AS TEXTFILE; create table sales2(day_id String, sale_nbr String, buy_nbr String, cnt String, round String) ROW format delimited fields terminated by ',' STORED AS TEXTFILE; create table sales3(day_id String, sale_nbr String, buy_nbr String, cnt String, round String) ROW format delimited fields terminated by ',' STORED AS TEXTFILE; load data local inpath '/opt/software/sample.txt' into table sales; select * from sales limit 10; 修改时间: insert overwrite table sales3 select date_add('2021-09-00',cast(day_id as int)) as day_id,sale_nbr as sale_nbr,buy_nbr as buy_nbr,cnt as cnt,round as round from sales2; create table jieguo1(day_id string, sale_nbr string, cnt string, round string) ROW format delimited fields terminated by ',' STORED AS TEXTFILE; create table jieguo01(day_id string, sale_nbr string, cnt string, round string) ROW format delimited fields terminated by ',' STORED AS TEXTFILE; 2.1 insert overwrite table jieguo1 select day_id as day_id ,sale_nbr as sale_nbr,sum(cnt) as cnt ,sum(round) as round from sales2 where sale_nbr like 'C%' group by day_id,sale_nbr; 2.2 create table jieguo02(day_id string, sale_nbr string, cnt string, round string) ROW format delimited fields terminated by ',' STORED AS TEXTFILE; 建表: create table jieguo2(day_id string, sale_nbr string, cnt string, round string) ROW format delimited fields terminated by ',' STORED AS TEXTFILE; 查询: insert overwrite table jieguo2 select day_id as day_id ,sale_nbr as sale_nbr,sum(cnt) as cnt ,sum(round) as round from sales3 where sale_nbr like 'O%' group by day_id,sale_nbr; bin/sqoop export \ --connect jdbc:mysql://Hadoop102:3306/company \ --username root \ --password 1229 \ --table jieguo02 \ --num-mappers 1 \ --export-dir /user/hive/warehouse/jieguo02 \ --input-fields-terminated-by "," 2.3 bin/sqoop export \ --connect jdbc:mysql://Hadoop102:3306/company \ --username root \ --password 1229 \ --table jieguo03 \ --num-mappers 1 \ --export-dir /user/hive/warehouse/jieguo03 \ --input-fields-terminated-by "," create table jieguo03(day_id String,sale_nbr String,sale_number String) ROW format delimited fields terminated by ','; insert overwrite table jieguo03 select day_id,sale_nbr,COUNT(*) as sale_number from sales3 group by day_id,sale_nbr having sale_nbr like 'O%'; 2.4 bin/sqoop export \ --connect jdbc:mysql://Hadoop102:3306/company \ --username root \ --password 1229 \ --table jieguo04 \ --num-mappers 1 \ --export-dir /user/hive/warehouse/daili \ --input-fields-terminated-by "," create table daiout as select day_id as day_id,sale_nbr as sale_nbr,sum(cnt)as cnt,sum(round)as round from sales3 where sale_nbr like 'O%' group by day_id,sale_nbr; create table daiin as select day_id as day_id,buy_nbr as buy_nbr,sum(cnt)as cnt,sum(round)as round from sales3 where buy_nbr like 'O%' group by day_id,buy_nbr; create table daili as select daiin.day_id as day_id,daiout.sale_nbr as sale_nbr,daiin.cnt as incnt,daiin.round as inround,daiout.cnt as outcnt,daiout.round as outround,daiout.round-daiin.round as li from daiin join daiout on (daiin.buy_nbr=daiout.sale_nbr);
echarts实现:
部分代码:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Title</title> <link rel="stylesheet" href="./layui/css/layui.css"> </head> <body> <script src="echarts.js"></script> <script src="js/jquery.min.js"></script> </body> <button class="layui-btn" onclick="getFirstData()">第一个表</button> <button class="layui-btn" onclick="getSecoundData()">第二个表</button> <button class="layui-btn" onclick="getThirdData()">第三个表</button> <div id="main" style="height:600%;" ></div> <script> var hzb=new Array(); var zzb=new Array(); function getFirstData(){ var url = "servlet?method=getFirstData&randnum=" + Math.random(); $.ajax({ type: "get", url: url, data: [], dataType: "json", success: function(result){ fenli(result) setChatr(1) }, error: function(){ alert("错误"); } }); } function getSecoundData(){ var url = "servlet?method=getSecoundData&randnum=" + Math.random(); $.ajax({ type: "get", url: url, data: [], dataType: "json", success: function(result){ fenli(result) setChatr(2) }, error: function(){ alert("错误"); } }); } function getThirdData(){ var url = "servlet?method=getThirdData&randnum=" + Math.random(); $.ajax({ type: "get", url: url, data: [], dataType: "json", success: function(result){ fenli(result) setChatr(3) }, error: function(){ alert("错误"); } }); } function fenli(data){ for(i in data){ hzb.push(data[i].hzb) zzb.push(data[i].zzb) } } function setChatr(num){ var chartDom = document.getElementById('main'); var myChart = echarts.init(chartDom); var Title; if(num==1)Title="金额"; if(num==2)Title="销售金额"; if(num==3)Title="销售活跃度"; var option; option = { title: { text: Title }, xAxis: { type: 'category', data: hzb, axisLabel: { interval: 0, formatter:function(value) { return value.split("").join("\n"); } }, }, yAxis: { type: 'value' }, series: [ { data: zzb, type: 'line', barWidth : 20, showBackground: true, backgroundStyle: { color: 'rgba(180, 180, 180, 0.2)' } } ], dataZoom : [ { type: 'slider', show: true, start: 94, end: 100, handleSize: 8 }, { type: 'inside', start: 94, end: 100 }, { type: 'slider', show: true, yAxisIndex: 0, filterMode: 'empty', width: 12, height: '70%', handleSize: 8, showDataShadow: false, left: '93%' } ] }; option && myChart.setOption(option); } </script> </html>
servlet:
package classes; import javax.servlet.*; import javax.servlet.http.*; import javax.servlet.annotation.*; import java.io.IOException; import java.io.PrintWriter; import java.sql.SQLException; import java.util.List; import net.sf.json.JSONArray; import net.sf.json.JSONObject; @WebServlet(name = "servlet", value = "/servlet") public class servlet extends HttpServlet { Dao dao=new Dao(); public void setdata(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setCharacterEncoding("utf-8"); response.setContentType("text/html;charset=UTF-8"); System.out.println("调用了方法"); JSONArray json=new JSONArray(); for(int i=0;i<5;i++){ JSONObject ob=new JSONObject(); ob.put("name","水果"); ob.put("num",i*10); json.add(ob); } System.out.println(json.toString()); PrintWriter out = response.getWriter(); out.write(json.toString()); } public void getAlldata(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setCharacterEncoding("utf-8"); response.setContentType("text/html;charset=UTF-8"); List<FourBean> list=dao.FourAllData(); JSONArray json=new JSONArray(); for(int i=0;i<list.size();i++){ JSONObject ob=new JSONObject(); ob.put("day_id",list.get(i).getDay_id()); ob.put("sale_nbr",list.get(i).getSale_nbr()); ob.put("incnt",list.get(i).getIncnt()); ob.put("inround",list.get(i).getInround()); ob.put("outcnt",list.get(i).getOutcnt()); ob.put("outround",list.get(i).getOutround()); ob.put("li",list.get(i).getLi()); json.add(ob); } PrintWriter out = response.getWriter(); out.write(json.toString()); System.out.println("发送完成"); } public void getFirstData(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setCharacterEncoding("utf-8"); response.setContentType("text/html;charset=UTF-8"); List<FirstBean> list=dao.getFirstData(); JSONArray json=new JSONArray(); for(int i=0;i<list.size();i++){ JSONObject ob=new JSONObject(); ob.put("hzb",list.get(i).getSale_nbr()); ob.put("zzb",list.get(i).getRound()); json.add(ob); } PrintWriter out = response.getWriter(); out.write(json.toString()); } public void getSecoundData(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setCharacterEncoding("utf-8"); response.setContentType("text/html;charset=UTF-8"); List<SecoundBean> list=dao.getSecoundData(); JSONArray json=new JSONArray(); for(int i=0;i<list.size();i++){ JSONObject ob=new JSONObject(); ob.put("hzb",list.get(i).getSale_nbr()); ob.put("zzb",list.get(i).getRound()); json.add(ob); } PrintWriter out = response.getWriter(); out.write(json.toString()); } public void getThirdData(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setCharacterEncoding("utf-8"); response.setContentType("text/html;charset=UTF-8"); List<ThirdBean> list=dao.getThirdData(); JSONArray json=new JSONArray(); for(int i=0;i<list.size();i++){ JSONObject ob=new JSONObject(); ob.put("hzb",list.get(i).getSale_nbr()); ob.put("zzb",list.get(i).getSale_number()); json.add(ob); } PrintWriter out = response.getWriter(); out.write(json.toString()); } @Override protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setCharacterEncoding("utf-8"); String method=request.getParameter("method"); if(method.equals("setdata")){ try { setdata(request,response); } catch (ServletException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } else if(method.equals("getAlldata")){ try { getAlldata(request,response); } catch (ServletException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } else if(method.equals("getFirstData")){ try { getFirstData(request,response); } catch (ServletException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } else if(method.equals("getSecoundData")){ try { getSecoundData(request,response); } catch (ServletException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } else if(method.equals("getThirdData")){ try { getThirdData(request,response); } catch (ServletException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } } @Override protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { doGet(request,response); } }
总结:还不够好,需要继续努力