Classification

Learn.jl provides SVC, logistic regression, random forest, decision tree, Gaussian naive Bayes, linear discriminant analysis for classification. The functions described below are the same for all classifiers. Examples with specific parameters can be found below.

Functions

fit!(clf::Classifier, X::Matrix{Float64}, y::Vector)

Fit the classifier clf to the inputs X and the targets y.

Parameters:
  • clf – The classifer object encapsulating parameters for the estimator. This parameter will be modified by the function. After running fit! clf will contain all information required to make predictions.
  • X – X assumes rows for observations and columns as features.
  • y – The targets, which can be of any type.
predict{T<:AbstractFloat}(clf::Classifier, X::Matrix{T})

Predict values for X using the fitted estimator clf.

Parameters:
  • clf – The classifier after fitting with fit!.
  • X – Input data with observations in rows and features in columns.
score{T<:AbstractFloat}(clf::Classifier, X::Matrix{T}, y_true::Vector; scoring::Union{Function, Void}=f1_score)

Return the score for a set of input observations X and targets y with scoring function scoring.

Parameters:
  • estimator – The estimator trained by fit.
  • X – The input values with observations as rows and features as columns.
  • y_true – The true targets to compare predictions with.
  • scoring – Optional scoring function. By default the F1 score is used for classification.

Multiclass Strategies

Classifier that discriminate beween only two classes can still be used for multiclass classification. Typically one uses either the one-vs-one or the one-vs-all strategies. Both have their downsides. Learn.jl implements both. All classifiers accept a strategy parameter that allows to specify the multiclass strategy. The one-vs-one strategy will be used by default. To specify the strategy provide either OneVsOneStrategy or OneVsAllStrategy as keyword argument to the classifier’s constructor like this:

clf = LogisticRegression(strategy=OneVsAllStrategy())

Logistic regression

This code uses GLM.JL.

LogisticRegression(; strategy::MulticlassStrategy=OneVsOneStrategy())
clf = LogisticRegression()
fit!(clf, X, y)
y_pred = predict(clf, X)
score(y, X)

Random forest classification

Implementation of the random forest algorithm for classification. This code uses DecisionTree.jl.

RandomForestClassifier(;nsubfeatures::Integer=2, ntrees::Integer=5, strategy::MulticlassStrategy=OneVsOneStrategy())

For detailed information about the parameters refer to the documentation of DecisionTree.jl.

clf = RandomForestClassifier()
fit!(clf, X, y)
y_pred = predict(clf, X)
score(y, X)

Decision tree classification

Implementation of the decison tree algorithm for classification. This code uses DecisionTree.jl.

DecisionTreeClassifier(;strategy::MulticlassStrategy=OneVsOneStrategy())
clf = DecisionTreeClassifier()
fit!(clf, X, y)
y_pred = predict(clf, X)
score(y, X)

Gaussian naive Bayes

Implementation of the Gaussian naive Bayes algorithm. This code uses NaiveBayes.jl.

GaussianNB(;strategy::MulticlassStrategy=OneVsOneStrategy())
clf = GaussianNB()
fit!(clf, X, y)
y_pred = predict(clf, X)
score(y, X)

Linear discriminant analysis

Implementation of the linear discrimant algorithm. This code uses MultivariateStats.jl.

LinearDiscriminantAnalysis(;strategy::MulticlassStrategy=OneVsOneStrategy())
clf = LinearDiscriminantAnalysis()
fit!(clf, X, y)
y_pred = predict(clf, X)
score(y, X)