Classification And Regression Tree

Decision tree learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the item’s target value.

Decision trees are mostly used as the following two types:

  • Classification tree, where the predicted outcome is the class to which the data belongs.
  • Regression tree, where predicted outcome can be considered a real number.

Classification And Regression Tree (CART) algorithm is an umbrella method that can be applied to generate both classification tree and regression tree.

In this example, we showed how to apply CART algorithm to multi-class dataset and predict the labels with classification tree.

Example

Imagine we have files with training and test data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) and CMulticlassLabels as

features_train = RealFeatures(f_feats_train)
features_test = RealFeatures(f_feats_test)
labels_train = MulticlassLabels(f_labels_train)
labels_test = MulticlassLabels(f_labels_test)
features_train = RealFeatures(f_feats_train);
features_test = RealFeatures(f_feats_test);
labels_train = MulticlassLabels(f_labels_train);
labels_test = MulticlassLabels(f_labels_test);
RealFeatures features_train = new RealFeatures(f_feats_train);
RealFeatures features_test = new RealFeatures(f_feats_test);
MulticlassLabels labels_train = new MulticlassLabels(f_labels_train);
MulticlassLabels labels_test = new MulticlassLabels(f_labels_test);
features_train = Modshogun::RealFeatures.new f_feats_train
features_test = Modshogun::RealFeatures.new f_feats_test
labels_train = Modshogun::MulticlassLabels.new f_labels_train
labels_test = Modshogun::MulticlassLabels.new f_labels_test
features_train <- RealFeatures(f_feats_train)
features_test <- RealFeatures(f_feats_test)
labels_train <- MulticlassLabels(f_labels_train)
labels_test <- MulticlassLabels(f_labels_test)
features_train = modshogun.RealFeatures(f_feats_train)
features_test = modshogun.RealFeatures(f_feats_test)
labels_train = modshogun.MulticlassLabels(f_labels_train)
labels_test = modshogun.MulticlassLabels(f_labels_test)
RealFeatures features_train = new RealFeatures(f_feats_train);
RealFeatures features_test = new RealFeatures(f_feats_test);
MulticlassLabels labels_train = new MulticlassLabels(f_labels_train);
MulticlassLabels labels_test = new MulticlassLabels(f_labels_test);
auto features_train = some<CDenseFeatures<float64_t>>(f_feats_train);
auto features_test = some<CDenseFeatures<float64_t>>(f_feats_test);
auto labels_train = some<CMulticlassLabels>(f_labels_train);
auto labels_test = some<CMulticlassLabels>(f_labels_test);

We set the type of each predictive attribute (true for nominal, false for ordinal/continuous)

ft = np.zeros( (2), dtype='bool')
ft[0] = False
ft[1] = False
ft = BoolVector(2);
ft.set_element(false, 0);
ft.set_element(false, 1);
BoolVector ft = new BoolVector(2);
ft.set_element(false, 0);
ft.set_element(false, 1);
ft = Modshogun::BoolVector.new 2
ft.set_element false, 0
ft.set_element false, 1
ft <- BoolVector(2)
ft[1] = FALSE
ft[2] = FALSE
ft = modshogun.BoolVector(2)
ft[1] = False
ft[2] = False
BoolVector ft = new BoolVector(2);
ft.set_element(false, 0);
ft.set_element(false, 1);
auto ft = SGVector<bool>(2);
ft[0] = false;
ft[1] = false;

We create an instance of the CCARTree classifier by passting it the attribute types and the tree type. We can also set the number of subsets used in cross-valiation and whether to use cross-validation pruning.

classifier = CARTree(ft, PT_MULTICLASS, 5, True)
classifier.set_labels(labels_train)
classifier = CARTree(ft, PT_MULTICLASS, 5, true);
classifier.set_labels(labels_train);
CARTree classifier = new CARTree(ft, EProblemType.PT_MULTICLASS, 5, true);
classifier.set_labels(labels_train);
classifier = Modshogun::CARTree.new ft, Modshogun::PT_MULTICLASS, 5, true
classifier.set_labels labels_train
classifier <- CARTree(ft, "PT_MULTICLASS", 5, TRUE)
classifier$set_labels(labels_train)
classifier = modshogun.CARTree(ft, modshogun.PT_MULTICLASS, 5, True)
classifier:set_labels(labels_train)
CARTree classifier = new CARTree(ft, EProblemType.PT_MULTICLASS, 5, true);
classifier.set_labels(labels_train);
auto classifier = some<CCARTree>(ft, EProblemType::PT_MULTICLASS, 5, true);
classifier->set_labels(labels_train);

Then we train and apply it to test data, which here gives CMulticlassLabels.

classifier.train(features_train)
labels_predict = classifier.apply_multiclass(features_test)
classifier.train(features_train);
labels_predict = classifier.apply_multiclass(features_test);
classifier.train(features_train);
MulticlassLabels labels_predict = classifier.apply_multiclass(features_test);
classifier.train features_train
labels_predict = classifier.apply_multiclass features_test
classifier$train(features_train)
labels_predict <- classifier$apply_multiclass(features_test)
classifier:train(features_train)
labels_predict = classifier:apply_multiclass(features_test)
classifier.train(features_train);
MulticlassLabels labels_predict = classifier.apply_multiclass(features_test);
classifier->train(features_train);
auto labels_predict = classifier->apply_multiclass(features_test);

We can evaluate test performance via e.g. CMulticlassAccuracy.

eval = MulticlassAccuracy()
accuracy = eval.evaluate(labels_predict, labels_test)
eval = MulticlassAccuracy();
accuracy = eval.evaluate(labels_predict, labels_test);
MulticlassAccuracy eval = new MulticlassAccuracy();
double accuracy = eval.evaluate(labels_predict, labels_test);
eval = Modshogun::MulticlassAccuracy.new 
accuracy = eval.evaluate labels_predict, labels_test
eval <- MulticlassAccuracy()
accuracy <- eval$evaluate(labels_predict, labels_test)
eval = modshogun.MulticlassAccuracy()
accuracy = eval:evaluate(labels_predict, labels_test)
MulticlassAccuracy eval = new MulticlassAccuracy();
double accuracy = eval.evaluate(labels_predict, labels_test);
auto eval = some<CMulticlassAccuracy>();
auto accuracy = eval->evaluate(labels_predict, labels_test);