首先贴上我的程序:实现的是读入一个文本文件, 然后过滤出包含有“in”的单词 ,计算出现频率写入文件(win7系统)
object Example {
def main(args: Array[String]) {
System.setProperty("hadoop.home.dir", "C:\\Program Files\\hadoop")
val conf = new SparkConf().setMaster("local").setAppName("Simple Application")
val sc = new SparkContext(conf)
val sFile = "\\opt\\eric\\spark-demo-source\\test.txt"
val textFile = sc.textFile(sFile)
val counts = textFile.flatMap(line => line.split(" "))
.filter(_.contains("in"))
.map(word=>(word,1)).reduceByKey(_+_)
counts.saveAsTextFile("\\opt\\eric\\spark-demo")
println("test******************"+counts.getNumPartitions)
}
}
但是这个程序只能执行一次,第二次本地已经有了\\opt\\eric\\spark-demo这个文件之后,程序就回报错,所以应该在saveasTextFile之前将其删除。
因此发现了http://rapture.io/这个东西。官网的解释是这样的:
Rapture is a family of Scala libraries providing beautiful idiomatic and typesafe Scala APIs for common programming tasks, like working with I/O, cryptography and JSON & XML processing.
加入rapture的依赖:
libraryDependencies += "com.propensive" %% "rapture" % "2.0.0-M3" exclude("com.propensive","rapture-json-lift_2.11")
程序中import相关的包:
import rapture.uri._ import rapture.io._ import rapture.fs._
删除文件:
val file = uri"file:///opt/eric/spark-demo" file.delete()
注意如果路径格式书写不对 ,window上操作会有这样的错误:
Error:(17, 16) value file is not a member of object rapture.uri.UriContext
val file = uri"file://opt/eric/spark-demo"
参考 path 的 书写方式:
https://blogs.msdn.microsoft.com/ie/2006/12/06/file-uris-in-windows/
测试后可以删除本地的文件。
浙公网安备 33010602011771号