公告

日历

from pyspark.ml.feature import PolynomialExpansion

data = [
    (0, Vectors.dense([1.0, 2.0])),
    (1, Vectors.dense([2.0, 3.0])),
    (2, Vectors.dense([3.0, 4.0]))
]

columns = ['id', 'features']

df = spark.createDataFrame(data, columns)

poly_expansion = PolynomialExpansion(inputCol='features', outputCol='expanded_features', degree=2)
expanded_df = poly_expansion.transform(df)
expanded_df.show(truncate=False)

+---+---------+-----------------------+
|id |features |expanded_features      |
+---+---------+-----------------------+
|0  |[1.0,2.0]|[1.0,1.0,2.0,2.0,4.0]  |
|1  |[2.0,3.0]|[2.0,4.0,3.0,6.0,9.0]  |
|2  |[3.0,4.0]|[3.0,9.0,4.0,12.0,16.0]|
+---+---------+-----------------------+

✅ What PolynomialExpansion Does (degree = 2)

Given an input vector with n features, e.g.

1 → original terms
2 → squared terms and cross terms

so the output contains:

📌 Your case: 2 input features, degree = 2

Input examples:

Row 1

Index	Polynomial term	Calculation	Value
0	x1	1.0	1.0
1	x1²	1.0²	1.0
2	x2	2.0	2.0
3	x1·x2	1.0 × 2.0	2.0
4	x2²	2.0²	4.0

✔ Matches Spark output exactly.

Row 2

Output:

Term	Value
x1 = 2	2
x1² = 2²	4
x2 = 3	3
x1·x2 = 2×3	6
x2² = 3²	9

Row 3

Output:

Term	Value
x1 = 3	3
x1² = 3²	9
x2 = 4	4
x1·x2 = 3×4	12
x2² = 4²	16

🎉 Summary: Output term order for degree = 2 and 2 input variables

Output vector has 5 terms:

General formula:

All degree 1 terms
All degree 2 terms
- squared terms
- interaction (cross) terms

This is commonly used in:

polynomial regression
interaction feature modeling
SVM / ML models needing non-linear boundaries

posted on 2025-11-23 22:13 ZhangZhihuiAAA 阅读(3) 评论(0) 收藏举报

刷新页面返回顶部


博客园 © 2004-2025 浙公网安备 33010602011771号浙ICP备2021040463号-3

导航