1.20
今天学习了Spark SQL,它是Spark中用于结构化数据处理的模块。通过DataFrame API,可以更方便地处理结构化数据。
代码示例:
python
复制
from pyspark.sql import SparkSession
# 创建SparkSession
spark = SparkSession.builder.appName("SparkSQL").getOrCreate()
# 创建DataFrame
data = [("Alice", 25), ("Bob", 30), ("Cathy", 28)]
df = spark.createDataFrame(data, ["Name", "Age"])
# 显示DataFrame
df.show()
# 执行SQL查询
df.createOrReplaceTempView("people")
result = spark.sql("SELECT Name FROM people WHERE Age > 27")
result.show()
spark.stop()
输出:
复制
+-----+---+
| Name|Age|
+-----+---+
|Alice| 25|
| Bob| 30|
|Cathy| 28|
+-----+---+
+-----+
| Name|
+-----+
| Bob|
|Cathy|
+-----+
浙公网安备 33010602011771号