Colab安装Pyspark
Colab安装Pyspark
1. 安装pyspark和java
!pip install pyspark
!pip install -U -q PyDrive
!apt install openjdk-8-jdk-headless -qq
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
2. 初始化SparkSession
from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf
# create the session
conf = SparkConf().set("spark.ui.port", "4050")
# create the context
sc = SparkContext(conf=conf)
spark = SparkSession.builder.getOrCreate()
3. 使用SparkSession和sc
rdd = sc.parallelize([("LiLei",15,88),("HanMeiMei",16,90),("DaChui",17,60)])
df = rdd.toDF(["name","age","score"])
df.show()
本文来自博客园,作者:{guangqiang.lu},转载请注明原文链接:{https://www.cnblogs.com/guangqianglu/}
 
                    
                     
                    
                 
                    
                
 
                
            
         
         浙公网安备 33010602011771号
浙公网安备 33010602011771号