时间:2021-07-01 10:21:17 帮助过:100人阅读
数据库数据:
结果:
2.编程实现将 RDD 转换为 DataFrame
官网给出两种方法,这里给出一种(使用编程接口,构造一个 schema 并将其应用在已知的 RDD 上。):
源码:
import org.apache.spark.sql.types._ import org.apache.spark.sql.Encoder import org.apache.spark.sql.Row import org.apache.spark.sql.SparkSession object RDDtoDF { def main(args: Array[String]) { val spark = SparkSession.builder().appName("RddToDFrame").master("local").getOrCreate() import spark.implicits._ val employeeRDD =spark.sparkContext.textFile("file:///usr/local/spark/employee.txt") val schemaString = "id name age" val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, nullable = true)) val schema = StructType(fields) val rowRDD = employeeRDD.map(_.split(",")).map(attributes => Row(attributes(0).trim, attributes(1), attributes(2).trim)) val employeeDF = spark.createDataFrame(rowRDD, schema) employeeDF.createOrReplaceTempView("employee") val results = spark.sql("SELECT id,name,age FROM employee") results.map(t => "id:"+t(0)+","+"name:"+t(1)+","+"age:"+t(2)).show() } }
结果:
第五周周二练习:实验 5 Spark SQL 编程初级实践
标签:实现 from parallel tor str schema dataframe 结果 view