Spark - pyspark.sql.Row - ZhangZhihuiAAA

from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName('spark with python') \ .master('local[*]') \ .enableHiveSupport() \ .getOrCreate() hsql = 'select distinct id from aaa' result = spark.sql(hsql).collect()

Good question 👍

The pyspark.sql.Row object is neither a plain Python list nor a plain tuple.

It is a special class provided by PySpark that behaves like a named tuple.

Key properties of `Row`

Looks like a named tuple

Indexable like a tuple
```
print(r[0])  # 1
print(r[1])  # Alice
```
Iterable like a tuple
```
for v in r:
    print(v)
# 1
# Alice
```

But not exactly a tuple

So, it’s its own type (Row), designed to combine the benefits of:

tuple (indexable, iterable)
dict / namedtuple (field names, attribute access)

✅ In short:
A Row is not a list or tuple, but it behaves like a tuple with names attached.

导航

Step 1. `spark.sql(hsql)`

Step 2. `.collect()`

Important notes

Key properties of `Row`


博客园 © 2004-2025 浙公网安备 33010602011771号浙ICP备2021040463号-3

导航

Step 1. spark.sql(hsql)

Step 2. .collect()

Important notes

ChatGPT said:

Key properties of Row

Step 1. `spark.sql(hsql)`

Step 2. `.collect()`

Key properties of `Row`