Model Ensemble
This is text I want to write
- class tubesml.stacker.Stacker(estimators, final_estimator, cv, lay1_kwargs=None, passthrough=False, verbose=False, bypass_importances=False)
Wrapper for stacking several estimators with a meta estimator. Each estimator creates out-of-fold predictions on the entire dataset via a KFold and is then re-fitted on the full dataset. The predictions are generated by
tubesml.CrossValidateand thus can account for early stopping and other options.The meta estimator trains on the predictions made by the estimators. It is possible to train the meta estimator on additional features from the original dataset.
- Parameters:
estimators – list of tuples
(name, estimator). These estimators generate the first layer of predictions.final_estimator – estimator or str. Estimator with
fitandpredict(orpredict_proba) methods. If set to"blend", no final estimator is fitted and the level‑1 predictions are simply averaged.cv – KFold-like generator. Cross-validation scheme used to generate the first layer of predictions.
lay1_kwargs – dict, optional. Dictionary of settings passed to
tubesml.CrossValidatefor the first layer. Keys must match the names inestimators.passthrough – bool or list, default False. If True, all features used in the first layer are passed to the final estimator. If a list, only those features are passed. Invalid input defaults to False.
verbose – bool, default False. If True, warns the user when the correlation between first-layer predictions exceeds 0.9.
- Attributes:
meta_importances_pandas.DataFrameFeature importances (or coefficients) of the final estimator. If the final estimator does not expose
coef_orfeature_importances_, this attribute will not be available.corr_pandas.DataFrameCorrelation matrix of the first-layer predictions.
- fit(X, y)
This method uses
tubesml.cv_scoreto create out of fold predictions from each of the estimators provided inestimators.Secondly, it fits the final_estimator on a dataset that contains these predictions and any feature specified by
passthrough. Each of theestimatorsis then refit on the entire datasetIf an estimator is in the first layer of estimators was trained with early stopping, in the refit it will be trained on a number of iterations equal to the mean number across the folds used to generate the first layer of predictions. Be sure that the early stopping attribute of the estimator is
early_stopping_round.If
verbose=Truethe user will be warned if any of the predictions are correlated more than 0.9. Thecorr_attribute is created by this method- Parameters:
X – pandas DataFrame of shape (n_samples, n_features) The training input samples.
y – array-like of shape (n_samples,) or (n_samples, n_outputs), Not used The target values (class labels) as integers or strings.
- predict(X, y=None)
Method to generate the final predictions. First, it generates the meta dataset with the predictions by the
estimators.If any of them was generated by a
predict_probamethod, it will be done again, otherwise it uses thepredictmethod of thoseestimators.The final prediction is generated by using the
predictmethod of thefinal_estimator.- Parameters:
X – pandas DataFrame of shape (n_samples, n_features) The training input samples.
y – array-like of shape (n_samples,) or (n_samples, n_outputs), Not used The target values (class labels) as integers or strings.
- Return preds:
ndarray of shape (n_samples,) or (n_samples, n_output) Predicted targets.
- predict_proba(X, y=None)
Method to generate the final predictions. First, it generates the meta dataset with the predictions by the
estimators.If any of them was generated by a
predict_probamethod, it will be done again, otherwise it uses thepredictmethod of thoseestimators.The final prediction is generated by using the
predict_probamethod of thefinal_estimator.- Parameters:
X – pandas DataFrame of shape (n_samples, n_features) The training input samples.
y – array-like of shape (n_samples,) or (n_samples, n_outputs), Not used The target values (class labels) as integers or strings.
- Return preds:
ndarray of shape (n_samples, n_classes) or list of ndarray of shape (n_output,) The class probabilities of the input samples.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') Stacker
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
Returns
- selfobject
The updated object.