Error Analysis

In this section, we find methods to analyze the errors of a model. This is done by training a surrogate model to predict the model errors. These methods allow for inspecting the results numerically and visually.

class tubesml.error_analysis.ErrorAnalyzer(data, prediction_column=None, surrogate_model=None, error_column=None, true_label=None, param_grid=None, regression=True, error_class_idx=1, fidelity_threshold=0.9, probability_threshold=0.5, n_leaves=3, random_state=None)

This class trains a surrogate model to explain the error made by a generic model on tabular data.

The user shall provide data that is ready to be used for a prediction with a sklearn model. Use the processing module to prepare the data if necessary. The data must contain a true label column and a corresponding prediction one. The user can also provide a column that flags with binary values if the observation is an error or not. If the error column is not provided, it will be calculated from the data.

This class works for both regression and classification problems, but ultimately solves a classification problem (is this observation an error? Why is the model wrong in this case?)

The result is a description of how the model makes wrong predictions. This comes as a summary of the most clear decisions the surrogate model took to predict an error and the dependency of such decisions from the provided features.

Parameters:

data – pandas DataFrame with some features we think are responsible of the mistakes we want to explain, a true label, and a prediction. Optionally, also an error column can be in the data
prediction_columns – string. Name of the column with the model predictions
surrogate_model – model object, optional. If you want to use a different model to interpret the results. This will reduce the level of intepretability.
error_column – (optional) string. Name of column signaling if the prediction was correct or not. If not provided, it will be calculated. The surrogate model will try to predict this column
true_label – string. Name of the column with the true labels
param_grid – (optional) dictionary. If provided, the surrogate model will be tuned with a GridSearch. The dictionary must then contain the parameters we want to try in the search for a DecisionTreeClassifier
regression – boolean. Useful only if the error column is not provided. It determines if the error column must be determined for a regression problem or a classification one.
error_class_idx – integer, default=1. Index of the classi indicating the error.
fidelity_threshold – float. We trust a surrogate model that has a fidelity higher than this threshold. Fidelity is defined as 1 - |actual_accuracy - estimated_accuracy|
probablity_threshold – float. In case we have to determine the error column for a classification problem, this is the threshold to consider an observation an error.
n_leaves – int. Number of leaves of which we want a summary of the decisions. These are always the leaves with the most error predicted.
random_state – int. Random state of the surrogate model. We recommend setting this to reproduce your results.

fit(X=None, y=None): Main method of the class that produces all the results. It prepares a surrogate model (a decision tree) to explain the error of the main model. Then it provides an analysis of the terminal nodes (leaves) of the surrogate model, feature importance, and shap values

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → ErrorAnalyzer

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

tubesml.error_analysis.format_float(number, decimals)

Format a number to have the required number of decimals. Ensure no trailing zeros remain.

Parameters:

number – float or integer. The number to format
decimals – integer. The number of decimals required

Returns:

A string with the number as a formatted string

tubesml.error_analysis.get_epsilon(difference)

Compute the threshold used to decide whether a prediction is wrong or correct (for regression tasks).

Compute the threshold used to decide whether a prediction is wrong or correct for regression tasks.

Parameters:: difference – 1D-array. The absolute differences between the true target values and the predicted ones (from the primary model).
Returns:: float. The threshold value used to decide whether a regression prediction is wrong or correct.

class tubesml.visualize_error.VisualizeError(analysis, original_feature_columns=None, show=False)

Methods to visualize the error analysis done by the ErrorAnalyzer class. Using the feature importance of the analysis, it shows the error rates and partial dependency plots for the most important features to determine the error of the model

plot_error_rates(features=None, n=4, bins=20)

Plots histograms of the error rates. That is the % of observations that are flagged as error vs a set of features. If the features are not provided, the most important one will be in the plot. The plot adatps its size based on how many features we display

Parameters:

features – list of features to display
n – int, if no feature is provided, it determines how many of the most important features we display.
bins – integer, how many bins to use in the histogram.

plot_feature_importance(n=-1, imp='shap')

Wrapper around tubesml.model_inspection.plot_feat_imp

Parameters:

n – integer, how many features you want to display in the plot.
imp – string. It can be either shap, standard, or bool

plot_pdp(features=None, n=4)

Wrapper around tubesml.model_inspection.plot_shap_values

Parameters:

features – list of features to display, optional.
n – int, if no feature is provided, it determines how many of the most important features we display.