from pyspark.ml.feature import PolynomialExpansion data = [ (0, Vectors.dense([1.0, 2.0])), (1, Vectors.dense([2.0, 3.0])), (2, Vectors.dense([3.0, 4.0])) ] columns = ['id', 'features'] df = spark.createDataFrame(data, columns) poly_expansion = PolynomialExpansion(inputCol='features', outputCol='expanded_features', degree=2) expanded_df = poly_expansion.transform(df) expanded_df.show(truncate=False)
+---+---------+-----------------------+ |id |features |expanded_features | +---+---------+-----------------------+ |0 |[1.0,2.0]|[1.0,1.0,2.0,2.0,4.0] | |1 |[2.0,3.0]|[2.0,4.0,3.0,6.0,9.0] | |2 |[3.0,4.0]|[3.0,9.0,4.0,12.0,16.0]| +---+---------+-----------------------+
✅ What PolynomialExpansion Does (degree = 2)
Given an input vector with n features, e.g.
-
1 → original terms
-
2 → squared terms and cross terms
so the output contains:
📌 Your case: 2 input features, degree = 2
Input examples:
Row 1
| Index | Polynomial term | Calculation | Value |
|---|---|---|---|
| 0 | x1 | 1.0 | 1.0 |
| 1 | x1² | 1.0² | 1.0 |
| 2 | x2 | 2.0 | 2.0 |
| 3 | x1·x2 | 1.0 × 2.0 | 2.0 |
| 4 | x2² | 2.0² | 4.0 |
✔ Matches Spark output exactly.
Row 2
Output:
| Term | Value |
|---|---|
| x1 = 2 | 2 |
| x1² = 2² | 4 |
| x2 = 3 | 3 |
| x1·x2 = 2×3 | 6 |
| x2² = 3² | 9 |
Row 3
Output:
| Term | Value |
|---|---|
| x1 = 3 | 3 |
| x1² = 3² | 9 |
| x2 = 4 | 4 |
| x1·x2 = 3×4 | 12 |
| x2² = 4² | 16 |
🎉 Summary: Output term order for degree = 2 and 2 input variables
Output vector has 5 terms:
General formula:
-
All degree 1 terms
-
All degree 2 terms
-
squared terms
-
interaction (cross) terms
-
This is commonly used in:
-
polynomial regression
-
interaction feature modeling
-
SVM / ML models needing non-linear boundaries

浙公网安备 33010602011771号