ZhangZhihui's Blog  

 

from pyspark.ml.feature import PolynomialExpansion

data = [
    (0, Vectors.dense([1.0, 2.0])),
    (1, Vectors.dense([2.0, 3.0])),
    (2, Vectors.dense([3.0, 4.0]))
]

columns = ['id', 'features']

df = spark.createDataFrame(data, columns)

poly_expansion = PolynomialExpansion(inputCol='features', outputCol='expanded_features', degree=2)
expanded_df = poly_expansion.transform(df)
expanded_df.show(truncate=False)

 

+---+---------+-----------------------+
|id |features |expanded_features      |
+---+---------+-----------------------+
|0  |[1.0,2.0]|[1.0,1.0,2.0,2.0,4.0]  |
|1  |[2.0,3.0]|[2.0,4.0,3.0,6.0,9.0]  |
|2  |[3.0,4.0]|[3.0,9.0,4.0,12.0,16.0]|
+---+---------+-----------------------+

 

✅ What PolynomialExpansion Does (degree = 2)

Given an input vector with n features, e.g.

features = [x1, x2]

and degree = 2, the transformer generates all polynomial terms of degree:

  • 1 → original terms

  • 2 → squared terms and cross terms

so the output contains:

[x1, x1², x2, x1·x2, x2²]

Spark outputs terms in a fixed order.


📌 Your case: 2 input features, degree = 2

Input examples:

Row 1

features = [1.0, 2.0]

Output:

expanded_features = [1.0, 1.0, 2.0, 2.0, 4.0]

Let's match each value:

IndexPolynomial termCalculationValue
0 x1 1.0 1.0
1 x1² 1.0² 1.0
2 x2 2.0 2.0
3 x1·x2 1.0 × 2.0 2.0
4 x2² 2.0² 4.0

✔ Matches Spark output exactly.


Row 2

features = [2.0, 3.0]

Output:

[2.0, 4.0, 3.0, 6.0, 9.0]
TermValue
x1 = 2 2
x1² = 2² 4
x2 = 3 3
x1·x2 = 2×3 6
x2² = 3² 9

Row 3

features = [3.0, 4.0]

Output:

[3.0, 9.0, 4.0, 12.0, 16.0]
TermValue
x1 = 3 3
x1² = 3² 9
x2 = 4 4
x1·x2 = 3×4 12
x2² = 4² 16

🎉 Summary: Output term order for degree = 2 and 2 input variables

Output vector has 5 terms:

[ x1, x1², x2, x1·x2, x2² ]

General formula:

  • All degree 1 terms

  • All degree 2 terms

    • squared terms

    • interaction (cross) terms

This is commonly used in:

  • polynomial regression

  • interaction feature modeling

  • SVM / ML models needing non-linear boundaries

 

posted on 2025-11-23 22:13  ZhangZhihuiAAA  阅读(3)  评论(0)    收藏  举报