Pipeline
========

``Pipeline`` allows to combine multiple preprocessors and one estimator, such as a classifier, regressor, or clustering algorithm into one estimator. ``Pipeline`` comes in especially handy in combination with cross validation, especially ``GridSearchCV``.

Creation
---------

A ``Pipeline`` is created by providing all of the necessary preprocessors and an estimator.

.. function:: Pipeline{T<:Estimator}(prepros::Vector{Tuple{ASCIIString, Preprocessor}}, est::Tuple{ASCIIString, T})
    
    Create a new pipeline object. The preprocessors are provided as a list of tuples. Each tuple contains the name of the preprocessor and the preprocessors object itself. 

    .. code-block:: julia

        pipe = Pipeline([("mms", MinMaxScaler()), ("ss", StandardScaler())], ("svc", SVC()))


Functions
---------

As the ``Pipeline`` is an ``Estimator`` the common functions for estimators all work on ``Pipeline`` as well.

.. function:: fit!(pipe::Pipeline, X::Matrix{Float64}, y::Vector)

    Fit the pipeline for regression or classification.

.. function:: fit!{T<:Cluster}(pipe::Pipeline{T}, X::Matrix{Float64})

    Fit the pipeline with an unsupervised estimator, such as a ``Kmeans``.

.. function:: predict(pipe::Pipeline, X::Matrix{Float64})

    Preprocess the data and predict values using the estimator's ``predict`` method.

.. function:: score{T<:AbstractFloat, S<:Union{Classifier, Regressor}}(pipe::Pipeline{S}, X::Matrix{T}, y_true::Vector; scoring::Union{Function, Void}=nothing)
    
    Scoring for classification and regression.

.. function:: score{T<:AbstractFloat, S<:Cluster}(pipe::Pipeline{S}, X::Matrix{T}; scoring::Union{Function, Void}=nothing)

    Scoring for unsupservised learning.

Example
-------

Create a pipeline with a ``MinMaxScaler``, followed by a ``StandardScaler``, and finally an ``SVC``.

    .. code-block:: julia

        pipe = Pipeline([("mms", MinMaxScaler()), ("ss", StandardScaler())], ("svc", SVC()))
        fit!(pipe, X, y)
        y_pred = predict(pipe, X)
        @test_approx_eq f1_score(y, y_pred)["1"] 1.0
        @test f1_score(y, y_pred)["0"] == 1.0