Model Comparison

In this section, we find a class to compare 2 models.

  • Compare metrics

  • Compare predictions visually

  • Perform a statistical test.

class tubesml.model_comparison.CompareModels(data, true_label, pred_1, pred_2, metric_func, regression=True, probabilities=True, kfold=None)

This class has methods to compare the results of 2 models. The user must provide the data used for the predictions, the true label, and the predictions of both models on said data. The comparison is done based on a (user provided) metric function, statistical significance tests, and visual comparison of the predictions

Parameters:
  • data – pandas DataFrame with the data used to create the predictions of both models

  • true_label – pandas Series with the true label

  • pred_1 – pandas Series with the predictions of the first model

  • pred_2 – pandas Series with the predictions of the second model

  • metric_func – function that takes y_true and y_pred or y_score as input. This function calculates the models metric

  • regression – boolean, to flag is the problem is a regression problem. This determines the type of statistical test and plots

  • probabilities – boolean, to flag if the predictions are in the form of probabilities. Relevant only if regression is False

  • kfold – kfold object used to produce the prediction, if any. If the regression predictions were made with kfold, we strongly recommend providing this for a meaningful test.

compare_metrics()

Calculates the provided metric for both models and produces a plot to compare them visually

compare_predictions(error_margin=0.05)

Visual comparison of the model predictions. For a regression problem, the comparison is done via a scatter plot of the 2 sets of prediction and via a confusion matrix. The conflusion matrix is showing the combinations of prediction that are ‘correct’. A correct prediction is defined as one with a percentage error below the provided error_margin.

For a classification problem, if the predictions are probabilities, the result is the same as for a regression problem. Othewise, only a confusion matrix of the 2 models correct classifications is shown.

Parameters:

error_margin – float, the margin of error to consider a prediction correct.

statistical_significance(error_margin=0.49)

Performs a statistical test to see if the differences between the 2 models are statistically significant. For regression problems, the test is a paired t-test on the losses of each model. If the predictions were produced via a Kfold process, we must perform the test on each fold in order to compare samples that are fairly independend. Subsequently, we can compare the mean losses of each fold via another paired t-test. Generally speaking, the results will be in agreement, but keep in mind that repeated statistical tests increase the probability of false positive.

For classification problems, the test is the Mcnemar on the contingency table of the 2 models, showing how many predictions were correct from both models, and how many are misclassified by either or both models. If the predictions were probabilities, we consider an error_margin to define a correct prediction

Parameters:

error_margin – float, margin of error for probability predictions.