泽民博客

sparl_ml_pipline

by 夏泽民 Nov 16, 2017

inspired by the scikit-learn project.

DataFrame: This ML API uses DataFrame from Spark SQL as an ML dataset, which can hold a variety of data types. E.g., a DataFrame could have different columns storing text, feature vectors, true labels, and predictions.

Transformer: A Transformer is an algorithm which can transform one DataFrame into another DataFrame. E.g., an ML model is a Transformer which transforms a DataFrame with features into a DataFrame with predictions.

Estimator: An Estimator is an algorithm which can be fit on a DataFrame to produce a Transformer. E.g., a learning algorithm is an Estimator which trains on a DataFrame and produces a model.Technically, an Estimator implements a method fit(), which accepts a DataFrame and produces a Model, which is a Transformer.

Pipeline: A Pipeline chains multiple Transformers and Estimators together to specify an ML workflow.

Parameter: All Transformers and Estimators now share a common API for specifying parameters.

参考：http://spark.apache.org/docs/latest/ml-pipeline.html

一句话概括：管道（Pipeline）是运用数据（DataFrame）训练算法模型（Estimator）调整参数（Parameter）得到一个最优的算法模型（Transformer）,转换数据（DataFrame）的流程。
For Transformer stages, the transform() method is called on the DataFrame. For Estimator stages, the fit() method is called to produce a Transformer (which becomes part of the PipelineModel, or fitted Pipeline), and that Transformer’s transform() method is called on the DataFrame.

Category spark

Error: Comments Not Initialized

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91

泽民博客

zemin

2025-04-16 21:36:29.91