Spark2.x管理与开发-Spark MLlib-【线性回归】+余弦的相似性
一、线性回归
1.运行一下官方提供的线性回归

里面存放的就是Spark官方提供的样例数据:

讲解下格式:

运行一下官方提供的线性回归:
[root@bigdata111 spark-2.1.0-bin-hadoop2.7]# ./bin/run-example mllib.LinearRegression data/mllib/sample_linear_regression_data.txt


2.现在用Spark写一下LinearRegression
Spark代码:
package sparkMllib
import org.apache.spark.sql.SparkSession
import org.apache.spark.ml.regression.LinearRegression
/**
* Spark MLlib
* 线性回归实现
*/
object XianXingHuiGuiShiXian {
def main(args: Array[String]): Unit = {
//一、创建Spark环境
val spark =SparkSession.builder().appName("XianXingHuiGuiShiXian").master("local").getOrCreate()
//二、以libsvm格式 将文件数据读取进来-训练数据
val trainning=spark.read.format("libsvm").load("D:\\tmp_files\\sample_linear_regression_data2.txt")
//三、声明训练模型
val lr=new LinearRegression().setMaxIter(10000)//训练10000次 (原则上训练次数越多,误差越小,但是这个数据有些问题,误差一直很大)
//四、加载训练数据
val lrModel=lr.fit(trainning)
//五、读取训练的结果
val trainningSummary=lrModel.summary
trainningSummary.predictions.show() //查看测试集-预测效果
println(s"RMSE:${trainningSummary.rootMeanSquaredError}") //s:插值操作, 打印出RMSE-误差
//六、停掉Spark
spark.stop()
}
}
结果:
+-------------------+--------------------+--------------------+
| label| features| prediction|
+-------------------+--------------------+--------------------+
| -9.490009878824548|(10,[0,1,2,3,4,5,...| 1.5211201432720063|
| 0.2577820163584905|(10,[0,1,2,3,4,5,...| -0.6658770747591632|
| -4.438869807456516|(10,[0,1,2,3,4,5,...| 0.1568703823211514|
|-19.782762789614537|(10,[0,1,2,3,4,5,...| 0.6374146679690593|
| -7.966593841555266|(10,[0,1,2,3,4,5,...| 2.372566473232916|
| -7.896274316726144|(10,[0,1,2,3,4,5,...| -1.9410651727650883|
| -8.464803554195287|(10,[0,1,2,3,4,5,...| 2.2621027950886363|
| 2.1214592666251364|(10,[0,1,2,3,4,5,...|-0.00134792656609...|
| 1.0720117616524107|(10,[0,1,2,3,4,5,...| -3.0051104606414007|
|-13.772441561702871|(10,[0,1,2,3,4,5,...| 3.5437265095387804|
| -5.082010756207233|(10,[0,1,2,3,4,5,...| -0.4889664122481736|
| 7.887786536531237|(10,[0,1,2,3,4,5,...| 1.5073098457843013|
| 14.323146365332388|(10,[0,1,2,3,4,5,...| 3.002580330272542|
|-20.057482615789212|(10,[0,1,2,3,4,5,...| 0.6644891587448811|
|-0.8995693247765151|(10,[0,1,2,3,4,5,...| 1.837123449000886|
| -19.16829262296376|(10,[0,1,2,3,4,5,...| -2.499423280435292|
| 5.601801561245534|(10,[0,1,2,3,4,5,...| -2.640384817630781|
|-3.2256352187273354|(10,[0,1,2,3,4,5,...| -1.853286585458312|
| 1.5299675726687754|(10,[0,1,2,3,4,5,...| 2.236000785795242|
| -0.250102447941961|(10,[0,1,2,3,4,5,...| 0.9090111490574454|
+-------------------+--------------------+--------------------+
only showing top 20 rows
RMSE:10.16309157133015
现在换一下数据:

结果:

二、余弦的相似性
参考:https://blog.csdn.net/u012160689/article/details/15341303
浙公网安备 33010602011771号