Model Ensemble

This is text I want to write

class tubesml.stacker.Stacker(estimators, final_estimator, cv, lay1_kwargs=None, passthrough=False, verbose=False, bypass_importances=False)

Wrapper for stacking several estimators with a meta estimator. Each estimator creates out-of-fold predictions on the entire dataset via a KFold and is then re-fitted on the full dataset. The predictions are generated by tubesml.CrossValidate and thus can account for early stopping and other options.

The meta estimator trains on the predictions made by the estimators. It is possible to train the meta estimator on additional features from the original dataset.

Parameters:

estimators – list of tuples (name, estimator). These estimators generate the first layer of predictions.
final_estimator – estimator or str. Estimator with fit and predict (or predict_proba) methods. If set to "blend", no final estimator is fitted and the level‑1 predictions are simply averaged.
cv – KFold-like generator. Cross-validation scheme used to generate the first layer of predictions.
lay1_kwargs – dict, optional. Dictionary of settings passed to tubesml.CrossValidate for the first layer. Keys must match the names in estimators.
passthrough – bool or list, default False. If True, all features used in the first layer are passed to the final estimator. If a list, only those features are passed. Invalid input defaults to False.
verbose – bool, default False. If True, warns the user when the correlation between first-layer predictions exceeds 0.9.

Attributes:

meta_importances_pandas.DataFrame: Feature importances (or coefficients) of the final estimator. If the final estimator does not expose coef_ or feature_importances_, this attribute will not be available.
corr_pandas.DataFrame: Correlation matrix of the first-layer predictions.

fit(X, y)

This method uses tubesml.cv_score to create out of fold predictions from each of the estimators provided in estimators.

Secondly, it fits the final_estimator on a dataset that contains these predictions and any feature specified by passthrough. Each of the estimators is then refit on the entire dataset

If an estimator is in the first layer of estimators was trained with early stopping, in the refit it will be trained on a number of iterations equal to the mean number across the folds used to generate the first layer of predictions. Be sure that the early stopping attribute of the estimator is early_stopping_round.

If verbose=True the user will be warned if any of the predictions are correlated more than 0.9. The corr_ attribute is created by this method

Parameters:

X – pandas DataFrame of shape (n_samples, n_features) The training input samples.
y – array-like of shape (n_samples,) or (n_samples, n_outputs), Not used The target values (class labels) as integers or strings.

predict(X, y=None)

Method to generate the final predictions. First, it generates the meta dataset with the predictions by the estimators.

If any of them was generated by a predict_proba method, it will be done again, otherwise it uses the predict method of those estimators.

The final prediction is generated by using the predict method of the final_estimator.

Parameters:

X – pandas DataFrame of shape (n_samples, n_features) The training input samples.
y – array-like of shape (n_samples,) or (n_samples, n_outputs), Not used The target values (class labels) as integers or strings.

Return preds:

ndarray of shape (n_samples,) or (n_samples, n_output) Predicted targets.

predict_proba(X, y=None)

Method to generate the final predictions. First, it generates the meta dataset with the predictions by the estimators.

If any of them was generated by a predict_proba method, it will be done again, otherwise it uses the predict method of those estimators.

The final prediction is generated by using the predict_proba method of the final_estimator.

Parameters:

X – pandas DataFrame of shape (n_samples, n_features) The training input samples.
y – array-like of shape (n_samples,) or (n_samples, n_outputs), Not used The target values (class labels) as integers or strings.

Return preds:

ndarray of shape (n_samples, n_classes) or list of ndarray of shape (n_output,) The class probabilities of the input samples.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → Stacker

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.