1、2 023 6 0、。、。a、。b、。c、。、。、。1 Hadoop 1.1 Hadoop Hadoop HDFS、MapRuduce、Yarn。Hadoop 1。1 Hadoop。a HDFS。HDFS Hadoop。Spark 200032、。HDFS Spark Yarn Spark。、。Hadoop Spark 162 023 6 b Yarn、。c Zookeeper。、。d Spark API。Spark SQL、。e Shark Spark Hive。f Sqoop HDFS MySQL、Oracle Hadoop HDFS HDFS。g Ambari ApacheHadoop、
2、。HDFS、MapReduce、Hive、Spark、Pig、Hbase、Zookeepr、Sqoop Hadoop。1.2 Spark Spark Hadoop Hadoop MapReduce。Spark 2。Spark a Sqark core、Spark RDD API。RDD。b SparkSQL SQL。c Spark Streaming、。d Mlib。e GraphX。2 Spark on Hadoop 2.1 a、。、0、。b、。c。MySQL、Oracle。20 60、。、。2.2 Hadoop Spark Spark a。b。Java、Python、Scala、Shell
3、 SQL。c。Spark、2 Spark 172 023 6 3 SparkonYarn、。d。Spark Hadoop。Spark Hadoop Yarn/ApacheMesos/Hadoop HDFS HBase Hadoop。Yarn Spark Spark Hadoop HDFS+Yarn+Spark。HDFS Hadoop hadoop。SparkonYarn 3。a Spark Re-sourceManager。b Resource Manager Node Manager App Master NodeManager。c NodeManager SparkAppMaster。d
4、Spark App Master Re-sourceManager。e Spark AppMaster RPC Node Manager Spark Executor。f Spark Executor Spark App Master。g App Master Spark Client。2.3 1 4 8GB CPU3.3GHz 64。2 Centos Java 1.8.0_111 NTP SSH Hadoop hadoop-2.7.3 Hive hive-1.2.1 mysql-connector Spark spark-2.1.1。3 a。b jdk。c。d/etc/ssh/sshd_co
5、nfig PubKeyAuthenticationyes。e NTP。f Hadoop Hadoop、182 023 6 1/s/s 1 300 2.034 2.69:ID:48h Map,()2 70 2.631 1.411:、():,12h Map,30min()3 9 4.672 3.588:、():15 min,Map(Map、0、)4 5000 10.925 10.895:()、(2):,()5 30 1.366 1.503:(),:,、。6 5000 12.579 14.819:、(2):【、】7 5000 15.641 17.273:、(2):Map,()【、】Hive、Spark、Yarn。dfs.replication3dfs.datanode.du.reserved1073741824dfs.block.size134217728!-datanode-dfs.namenode.replication.interval22.4、1。4 20s。19