ZhangZhihui's Blog  

2025年11月23日

摘要: from pyspark.ml.feature import PolynomialExpansion data = [ (0, Vectors.dense([1.0, 2.0])), (1, Vectors.dense([2.0, 3.0])), (2, Vectors.dense([3.0, 4. 阅读全文
posted @ 2025-11-23 22:13 ZhangZhihuiAAA 阅读(6) 评论(0) 推荐(0)
 
摘要: from pyspark.ml.feature import PCA pca = PCA(k=2, inputCol='features', outputCol='pca_features') pca_model = pca.fit(df) pca_df = pca_model.transform( 阅读全文
posted @ 2025-11-23 21:47 ZhangZhihuiAAA 阅读(5) 评论(0) 推荐(0)
 
摘要: from pyspark.ml.feature import Normalizer normalizer = Normalizer(inputCol='features', outputCol='normalized_features', p=2.0) normalized_df = normali 阅读全文
posted @ 2025-11-23 21:19 ZhangZhihuiAAA 阅读(4) 评论(0) 推荐(0)
 
摘要: from pyspark.ml.feature import MinMaxScaler scaler = MinMaxScaler(inputCol='features', outputCol='scaled_features') scaler_model = scaler.fit(df) scal 阅读全文
posted @ 2025-11-23 21:05 ZhangZhihuiAAA 阅读(4) 评论(0) 推荐(0)
 
摘要: from pyspark.ml.feature import OneHotEncoder data = [(0.0, 1.0), (1.0, 0.0), (2.0, 1.0)] columns = ["input1", "input2"] df = spark.createDataFrame(dat 阅读全文
posted @ 2025-11-23 17:37 ZhangZhihuiAAA 阅读(7) 评论(0) 推荐(0)
 
摘要: In Spark, I ran below code snippet: # The vocabSize parameter was set to 7 (0 to 6 - a total of 7 words), # meaaning that the vocabulary size (unique 阅读全文
posted @ 2025-11-23 10:47 ZhangZhihuiAAA 阅读(5) 评论(0) 推荐(0)
 
摘要: To sort a Pandas DataFrame by a column, use sort_values(). Here are the common usages: 1. Sort by one column df_sorted = df.sort_values(by="column_nam 阅读全文
posted @ 2025-11-23 08:01 ZhangZhihuiAAA 阅读(6) 评论(0) 推荐(0)