CHAID tree

CHAID (Chi-squared Automatic Interaction Detector) algorithm is a type of decision tree technique, which relies on the Chi-square test to determine the best next split at each step.

CHAID accepts nominal or ordinal categorical predictors only. If predictors are continuous, they have to be transformed into ordinal predictors by binning before tree growing, and an ANOVA F-test will be used for nodes split.

Example

Imagine we have files with training and test data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) and CMulticlassLabels as

features_train = RealFeatures(f_feats_train)
features_test = RealFeatures(f_feats_test)
labels_train = MulticlassLabels(f_labels_train)
labels_test = MulticlassLabels(f_labels_test)
features_train = RealFeatures(f_feats_train);
features_test = RealFeatures(f_feats_test);
labels_train = MulticlassLabels(f_labels_train);
labels_test = MulticlassLabels(f_labels_test);
RealFeatures features_train = new RealFeatures(f_feats_train);
RealFeatures features_test = new RealFeatures(f_feats_test);
MulticlassLabels labels_train = new MulticlassLabels(f_labels_train);
MulticlassLabels labels_test = new MulticlassLabels(f_labels_test);
features_train = Modshogun::RealFeatures.new f_feats_train
features_test = Modshogun::RealFeatures.new f_feats_test
labels_train = Modshogun::MulticlassLabels.new f_labels_train
labels_test = Modshogun::MulticlassLabels.new f_labels_test
features_train <- RealFeatures(f_feats_train)
features_test <- RealFeatures(f_feats_test)
labels_train <- MulticlassLabels(f_labels_train)
labels_test <- MulticlassLabels(f_labels_test)
features_train = modshogun.RealFeatures(f_feats_train)
features_test = modshogun.RealFeatures(f_feats_test)
labels_train = modshogun.MulticlassLabels(f_labels_train)
labels_test = modshogun.MulticlassLabels(f_labels_test)
RealFeatures features_train = new RealFeatures(f_feats_train);
RealFeatures features_test = new RealFeatures(f_feats_test);
MulticlassLabels labels_train = new MulticlassLabels(f_labels_train);
MulticlassLabels labels_test = new MulticlassLabels(f_labels_test);
auto features_train = some<CDenseFeatures<float64_t>>(f_feats_train);
auto features_test = some<CDenseFeatures<float64_t>>(f_feats_test);
auto labels_train = some<CMulticlassLabels>(f_labels_train);
auto labels_test = some<CMulticlassLabels>(f_labels_test);

We set the feature types to continuous. The types can be set to \(0\) for nominal, \(1\) for ordinal and \(2\) for continuous.

ft = np.zeros( (2), dtype='int32')
ft[0] = 2
ft[1] = 2
ft = zeros(1, 2, 'int32');
ft(1) = 2;
ft(2) = 2;
DoubleMatrix ft = new DoubleMatrix(1, 2);
ft.put(0, 2);
ft.put(1, 2);
ft = NArray.sint(2)
ft[0] = 2
ft[1] = 2
ft <- IntVector(2)
ft[1] = 2
ft[2] = 2
ft = modshogun.IntVector(2)
ft[1] = 2
ft[2] = 2
var ft = new int[2];
ft[0] = 2;
ft[1] = 2;
auto ft = SGVector<int32_t>(2);
ft[0] = 2;
ft[1] = 2;

We create an instance of the CCHAIDTree classifier by passing it the label and train dataset feature types. For continuous predictors, the user has to provide the number of bins for continuous to ordinal conversion.

classifier = CHAIDTree(0, ft, 10)
classifier.set_labels(labels_train)
classifier = CHAIDTree(0, ft, 10);
classifier.set_labels(labels_train);
CHAIDTree classifier = new CHAIDTree(0, ft, 10);
classifier.set_labels(labels_train);
classifier = Modshogun::CHAIDTree.new 0, ft, 10
classifier.set_labels labels_train
classifier <- CHAIDTree(0, ft, 10)
classifier$set_labels(labels_train)
classifier = modshogun.CHAIDTree(0, ft, 10)
classifier:set_labels(labels_train)
CHAIDTree classifier = new CHAIDTree(0, ft, 10);
classifier.set_labels(labels_train);
auto classifier = some<CCHAIDTree>(0, ft, 10);
classifier->set_labels(labels_train);

Then we train and apply it to test data, which here gives CMulticlassLabels.

classifier.train(features_train)
labels_predict = classifier.apply_multiclass(features_test)
classifier.train(features_train);
labels_predict = classifier.apply_multiclass(features_test);
classifier.train(features_train);
MulticlassLabels labels_predict = classifier.apply_multiclass(features_test);
classifier.train features_train
labels_predict = classifier.apply_multiclass features_test
classifier$train(features_train)
labels_predict <- classifier$apply_multiclass(features_test)
classifier:train(features_train)
labels_predict = classifier:apply_multiclass(features_test)
classifier.train(features_train);
MulticlassLabels labels_predict = classifier.apply_multiclass(features_test);
classifier->train(features_train);
auto labels_predict = classifier->apply_multiclass(features_test);

We can evaluate test performance via e.g. CMulticlassAccuracy.

eval = MulticlassAccuracy()
accuracy = eval.evaluate(labels_predict, labels_test)
eval = MulticlassAccuracy();
accuracy = eval.evaluate(labels_predict, labels_test);
MulticlassAccuracy eval = new MulticlassAccuracy();
double accuracy = eval.evaluate(labels_predict, labels_test);
eval = Modshogun::MulticlassAccuracy.new 
accuracy = eval.evaluate labels_predict, labels_test
eval <- MulticlassAccuracy()
accuracy <- eval$evaluate(labels_predict, labels_test)
eval = modshogun.MulticlassAccuracy()
accuracy = eval:evaluate(labels_predict, labels_test)
MulticlassAccuracy eval = new MulticlassAccuracy();
double accuracy = eval.evaluate(labels_predict, labels_test);
auto eval = some<CMulticlassAccuracy>();
auto accuracy = eval->evaluate(labels_predict, labels_test);