Spark2.x管理与开发-Spark MLlib-【线性回归】+余弦的相似性

Posted on 2020-08-13 17:35  MissRong  阅读(238)  评论(0)    收藏  举报

Spark2.x管理与开发-Spark MLlib-【线性回归】+余弦的相似性

一、线性回归

1.运行一下官方提供的线性回归

 

里面存放的就是Spark官方提供的样例数据:

 

讲解下格式:

 

运行一下官方提供的线性回归:

[root@bigdata111 spark-2.1.0-bin-hadoop2.7]# ./bin/run-example mllib.LinearRegression data/mllib/sample_linear_regression_data.txt

 

 

 

2.现在用Spark写一下LinearRegression

Spark代码:

package sparkMllib

import org.apache.spark.sql.SparkSession
import org.apache.spark.ml.regression.LinearRegression

/**
 * Spark MLlib
  * 线性回归实现
 */
object XianXingHuiGuiShiXian {
  def main(args: Array[String]): Unit = {
    //一、创建Spark环境
    val spark =SparkSession.builder().appName("XianXingHuiGuiShiXian").master("local").getOrCreate()
    //二、以libsvm格式 将文件数据读取进来-训练数据
    val trainning=spark.read.format("libsvm").load("D:\\tmp_files\\sample_linear_regression_data2.txt")
    //三、声明训练模型
    val lr=new LinearRegression().setMaxIter(10000)//训练10000次 (原则上训练次数越多,误差越小,但是这个数据有些问题,误差一直很大)
    //四、加载训练数据
    val lrModel=lr.fit(trainning)
    //五、读取训练的结果
    val trainningSummary=lrModel.summary
    trainningSummary.predictions.show() //查看测试集-预测效果
    println(s"RMSE:${trainningSummary.rootMeanSquaredError}") //s:插值操作, 打印出RMSE-误差
    //六、停掉Spark
    spark.stop()
  }
}

结果:

+-------------------+--------------------+--------------------+

|              label|            features|          prediction|

+-------------------+--------------------+--------------------+

| -9.490009878824548|(10,[0,1,2,3,4,5,...|  1.5211201432720063|

| 0.2577820163584905|(10,[0,1,2,3,4,5,...| -0.6658770747591632|

| -4.438869807456516|(10,[0,1,2,3,4,5,...|  0.1568703823211514|

|-19.782762789614537|(10,[0,1,2,3,4,5,...|  0.6374146679690593|

| -7.966593841555266|(10,[0,1,2,3,4,5,...|   2.372566473232916|

| -7.896274316726144|(10,[0,1,2,3,4,5,...| -1.9410651727650883|

| -8.464803554195287|(10,[0,1,2,3,4,5,...|  2.2621027950886363|

| 2.1214592666251364|(10,[0,1,2,3,4,5,...|-0.00134792656609...|

| 1.0720117616524107|(10,[0,1,2,3,4,5,...| -3.0051104606414007|

|-13.772441561702871|(10,[0,1,2,3,4,5,...|  3.5437265095387804|

| -5.082010756207233|(10,[0,1,2,3,4,5,...| -0.4889664122481736|

|  7.887786536531237|(10,[0,1,2,3,4,5,...|  1.5073098457843013|

| 14.323146365332388|(10,[0,1,2,3,4,5,...|   3.002580330272542|

|-20.057482615789212|(10,[0,1,2,3,4,5,...|  0.6644891587448811|

|-0.8995693247765151|(10,[0,1,2,3,4,5,...|   1.837123449000886|

| -19.16829262296376|(10,[0,1,2,3,4,5,...|  -2.499423280435292|

|  5.601801561245534|(10,[0,1,2,3,4,5,...|  -2.640384817630781|

|-3.2256352187273354|(10,[0,1,2,3,4,5,...|  -1.853286585458312|

| 1.5299675726687754|(10,[0,1,2,3,4,5,...|   2.236000785795242|

| -0.250102447941961|(10,[0,1,2,3,4,5,...|  0.9090111490574454|

+-------------------+--------------------+--------------------+

only showing top 20 rows

 

RMSE:10.16309157133015

现在换一下数据:

 

结果:

 

二、余弦的相似性

参考:https://blog.csdn.net/u012160689/article/details/15341303

博客园  ©  2004-2025
浙公网安备 33010602011771号 浙ICP备2021040463号-3