当前位置:首页 - Spark

spark rdd转dataframe并显示

作者:高景洋 日期:2020-11-13 15:14:40 浏览次数:1619

spark如何将rdd转换成dataframe?


***注意点***:RDD中的每条数据,一定要结构统一。不然会报以下错误:

ValueError: Length of object (3) does not match with length of fields (4)

注意场景:我们在hbase里的数据结构不统一,如有些数据有 JobHistory 列,但是有的没有。所以,当我们把数据从Hbase读出来后,进行 toDF 操作,报错。


下边为正常的Rdd转dataframe 示例:


from pyspark import SparkContext,SparkConf 
from pyspark.sql.session import SparkSession 
from pyspark.sql.types import StructField, StructType, StringType 
if __name__ == '__main__':
    conf = SparkConf()
    sc = SparkContext(conf=conf)
    data = [('Alex','male',3,10),('Nancy','female',6,10),('Jack','male',9,None)]
    rdd = sc.parallelize(data)
    schema = StructType([ 
        # true代表不为空  
        StructField("name", StringType(), True),  
        StructField("gender", StringType(), True),  
        StructField("num", StringType(), True),  
        StructField("price", StringType(), True)
    ])
    spark = SparkSession.builder.master("local").appName("SparkOnHive").getOrCreate()#.enableHiveSupport()  
    df = spark.createDataFrame(rdd,schema=schema)
    df.show()
    spark.stop()
    sc.stop()

执行结果如下图:


本文永久性链接:
<a href="http://r4.com.cn/art161.aspx">spark rdd转dataframe并显示</a>
当前header:Host: r4.com.cn X-Host1: r4.com.cn X-Host2: r4.com.cn X-Host3: 127.0.0.1:8080 X-Forwarded-For: 54.165.248.212 X-Real-Ip: 54.165.248.212 X-Domain: r4.com.cn X-Request: GET /art161.aspx HTTP/1.1 X-Request-Uri: /art161.aspx Connection: close Accept: */* User-Agent: claudebot Referer: http://www.yuezhiji.net/art161.aspx