ZhangZhihui's Blog  
上一页 1 ··· 8 9 10 11 12 13 14 15 16 ··· 116 下一页

2025年11月27日

摘要: The Silhouette Score is a metric used to evaluate the quality of a clustering result. It measures how well each data point fits within its assigned cl 阅读全文
posted @ 2025-11-27 09:29 ZhangZhihuiAAA 阅读(4) 评论(0) 推荐(0)

2025年11月26日

摘要: newsgroups = newsgroups.drop('tf_features') # dropping non-existed column doesn't cause an error 阅读全文
posted @ 2025-11-26 15:15 ZhangZhihuiAAA 阅读(5) 评论(0) 推荐(0)
 
摘要: from pyspark.sql.functions import expr df_filtered = df_filtered.withColumn('filtered_array', expr('filter(filtered_doc, x -> len(x) >= 4)')) Please h 阅读全文
posted @ 2025-11-26 11:03 ZhangZhihuiAAA 阅读(2) 评论(0) 推荐(0)
 
摘要: In PostgreSQL, you can convert a timestamp to a date (i.e., drop hours/minutes/seconds) in several common ways: ✅ 1. Cast to date Fastest and simplest 阅读全文
posted @ 2025-11-26 08:29 ZhangZhihuiAAA 阅读(14) 评论(0) 推荐(0)

2025年11月23日

摘要: from pyspark.ml.feature import PolynomialExpansion data = [ (0, Vectors.dense([1.0, 2.0])), (1, Vectors.dense([2.0, 3.0])), (2, Vectors.dense([3.0, 4. 阅读全文
posted @ 2025-11-23 22:13 ZhangZhihuiAAA 阅读(6) 评论(0) 推荐(0)
 
摘要: from pyspark.ml.feature import PCA pca = PCA(k=2, inputCol='features', outputCol='pca_features') pca_model = pca.fit(df) pca_df = pca_model.transform( 阅读全文
posted @ 2025-11-23 21:47 ZhangZhihuiAAA 阅读(5) 评论(0) 推荐(0)
 
摘要: from pyspark.ml.feature import Normalizer normalizer = Normalizer(inputCol='features', outputCol='normalized_features', p=2.0) normalized_df = normali 阅读全文
posted @ 2025-11-23 21:19 ZhangZhihuiAAA 阅读(4) 评论(0) 推荐(0)
 
摘要: from pyspark.ml.feature import MinMaxScaler scaler = MinMaxScaler(inputCol='features', outputCol='scaled_features') scaler_model = scaler.fit(df) scal 阅读全文
posted @ 2025-11-23 21:05 ZhangZhihuiAAA 阅读(4) 评论(0) 推荐(0)
 
摘要: from pyspark.ml.feature import OneHotEncoder data = [(0.0, 1.0), (1.0, 0.0), (2.0, 1.0)] columns = ["input1", "input2"] df = spark.createDataFrame(dat 阅读全文
posted @ 2025-11-23 17:37 ZhangZhihuiAAA 阅读(7) 评论(0) 推荐(0)
 
摘要: In Spark, I ran below code snippet: # The vocabSize parameter was set to 7 (0 to 6 - a total of 7 words), # meaaning that the vocabulary size (unique 阅读全文
posted @ 2025-11-23 10:47 ZhangZhihuiAAA 阅读(5) 评论(0) 推荐(0)
上一页 1 ··· 8 9 10 11 12 13 14 15 16 ··· 116 下一页