收藏 分享(赏)

(6.1.1)--Chapter6-1SparkMLlib.pdf

上传人:职教中国 文档编号:13774446 上传时间:2022-10-21 格式:PDF 页数:18 大小:1.30MB
下载 相关 举报
(6.1.1)--Chapter6-1SparkMLlib.pdf_第1页
第1页 / 共18页
(6.1.1)--Chapter6-1SparkMLlib.pdf_第2页
第2页 / 共18页
(6.1.1)--Chapter6-1SparkMLlib.pdf_第3页
第3页 / 共18页
亲,该文档总共18页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

1、Haiying CheInstitute of Data Science and Knowledge EngineeringSchool of Computer ScienceBeijing Institute of TechnologySpark MLlib2Big Data ApplicationComputing AlgorithmComputing ModelData processing systemComputing Platform & EngineData Storing SystemSpark MLlibData VisualizationData Products and

2、Data ServicesBig Data ApplicationData Application systemTensorFlowRecommendation SystemSocial Networking345Why Spark Mllib?MLlib is Apache Sparks scalable machine learning library.Ease of useUsable in Java, Scala, Python, and RPerformanceHigh-quality algorithms, 100 x faster than MapReduce.Runs ever

3、ywhereSpark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud, against diverse data sources.To support Python with Spark, the Apache Spark community released a tool, PySpark. Using PySpark, one can work with RDDs in Python programming language.671 Spark MLlib AlgorithmsML algorit

4、hms include:Classification: logistic regression, naive Bayes,.Regression: generalized linear regression, survival regression,.Decision trees, random forests, and gradient-boosted treesRecommendation: alternating least squares (ALS)Clustering: K-means, Gaussian mixtures (GMMs),.Topic modeling: latent

5、 Dirichlet allocation (LDA)Frequent item sets, association rules, and sequential pattern mining82 Spark MLlib workflow utilities ML workflow utilities include:Feature transformations: standardization, normalization, hashing,.ML Pipeline constructionModel evaluation and hyper-parameter tuningML persi

6、stence: saving and loading models and PipelinesOther utilities include:Distributed linear algebra: SVD, PCA,.Statistics: summary statistics, hypothesis testing,.93 Machine Learning Pipeline103.1TransformerAbstraction that includes feature transformers and learned modelsTransforming data into consuma

7、ble formatTake input column, transform it to an output columnExamples: Normalize the data Tokenization-sentences into words Converting categorical values into numbers113.2 Estimator Learning algorithm that trains (fit) on data Return a model, which is type of TransformerExamplesLogisticRegression.fi

8、t()= LogisticRegressionModel123.3 Evaluator Evaluate the model performance based certain metric ROC, RMSE Help with automating the model tuning process Comparing model performance Select the best model for generating predictionsExamples: BinaryClassificationEvaluator, CrossValidator133.4 Pipeline To

9、 represent a ML workflow Consist of a set of stages Leverage the uniform API of Transformer & Estimator A type of Estimator fit() Can be persistedhttps:/spark.apache.org/docs/latest/ml-pipeline.html143.5 ParametersMLlib Estimators and Transformers use a uniform API for specifying parameters.A Param

10、is a named parameter with self-contained documentation. A ParamMap is a set of (parameter, value) pairs.Parameters belong to specific instances of Estimators and Transformers. For example, if we have two LogisticRegression instances lr1 and lr2, then we can build a ParamMap with both maxIter paramet

11、ers specified: ParamMap(lr1.maxIter - 10, lr2.maxIter - 20). This is useful if there are two algorithms with the maxIter parameter in a Pipeline.There are two main ways to pass parameters to an algorithm: Set parameters for an instance. E.g., if lr is an instance of LogisticRegression, one could cal

12、l lr.setMaxIter(10) to make lr.fit() use at most 10 iterations. This API resembles the API used in spark.mllib package. Pass a ParamMap to fit() or transform(). Any parameters in the ParamMap will override parameters previously specified via setter methods.153.6 Automating model tuning ParamGridBuilder CrossValudator (K-fold)163.6 Model persistence 17Hands-onAlgorithmHigh level toolsClassificationRegressionClusteringDimensionality reductionParameter tuningPipelineQuestions ?

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 高等教育 > 大学课件

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报