Pipeline¶
Pipeline allows to combine multiple preprocessors and one estimator, such as a classifier, regressor, or clustering algorithm into one estimator. Pipeline comes in especially handy in combination with cross validation, especially GridSearchCV.
Creation¶
A Pipeline is created by providing all of the necessary preprocessors and an estimator.
-
Pipeline{T<:Estimator}(prepros::Vector{Tuple{ASCIIString, Preprocessor}}, est::Tuple{ASCIIString, T}) Create a new pipeline object. The preprocessors are provided as a list of tuples. Each tuple contains the name of the preprocessor and the preprocessors object itself.
pipe = Pipeline([("mms", MinMaxScaler()), ("ss", StandardScaler())], ("svc", SVC()))
Functions¶
As the Pipeline is an Estimator the common functions for estimators all work on Pipeline as well.
-
fit!(pipe::Pipeline, X::Matrix{Float64}, y::Vector) Fit the pipeline for regression or classification.
-
fit!{T<:Cluster}(pipe::Pipeline{T}, X::Matrix{Float64}) Fit the pipeline with an unsupervised estimator, such as a
Kmeans.
-
predict(pipe::Pipeline, X::Matrix{Float64})¶ Preprocess the data and predict values using the estimator’s
predictmethod.
-
score{T<:AbstractFloat, S<:Union{Classifier, Regressor}}(pipe::Pipeline{S}, X::Matrix{T}, y_true::Vector; scoring::Union{Function, Void}=nothing) Scoring for classification and regression.
-
score{T<:AbstractFloat, S<:Cluster}(pipe::Pipeline{S}, X::Matrix{T}; scoring::Union{Function, Void}=nothing) Scoring for unsupservised learning.
Example¶
Create a pipeline with a MinMaxScaler, followed by a StandardScaler, and finally an SVC.
pipe = Pipeline([("mms", MinMaxScaler()), ("ss", StandardScaler())], ("svc", SVC())) fit!(pipe, X, y) y_pred = predict(pipe, X) @test_approx_eq f1_score(y, y_pred)["1"] 1.0 @test f1_score(y, y_pred)["0"] == 1.0