1.20

今天学习了Spark SQL,它是Spark中用于结构化数据处理的模块。通过DataFrame API,可以更方便地处理结构化数据。

代码示例:

python
复制
from pyspark.sql import SparkSession

# 创建SparkSession
spark = SparkSession.builder.appName("SparkSQL").getOrCreate()

# 创建DataFrame
data = [("Alice", 25), ("Bob", 30), ("Cathy", 28)]
df = spark.createDataFrame(data, ["Name", "Age"])

# 显示DataFrame
df.show()

# 执行SQL查询
df.createOrReplaceTempView("people")
result = spark.sql("SELECT Name FROM people WHERE Age > 27")
result.show()

spark.stop()
输出:

复制
+-----+---+
| Name|Age|
+-----+---+
|Alice| 25|
| Bob| 30|
|Cathy| 28|
+-----+---+

+-----+
| Name|
+-----+
| Bob|
|Cathy|
+-----+

posted @ 2025-01-20 23:19  混沌武士丞  阅读(7)  评论(0)    收藏  举报