Utility classes

In this section, we find classes that can support the rest of your pipeline. In particular, with these classes you can

select part of the data based on their data type
unified multiple pipelines into one transformer

class tubesml.utility.DtypeSel(dtype='numeric')

This transformer selects either numerical or categorical features. In this way we can build separate pipelines for separate data types.

Parameters:: dtype – str, the type of data to select, default=’numeric’. Allowed values: ‘numeric’, ‘category’

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → DtypeSel

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

transform(X, y=None)

Method to select columns based on their type.

It populates the columns attribute with the columns of the output data

Parameters:

X – pandas DataFrame of shape (n_samples, n_features) The input samples.
y – array-like of shape (n_samples,) or (n_samples, n_outputs), Not used The target values (class labels) as integers or strings.

Returns:

pandas DataFrame with columns of the selected type

class tubesml.utility.FeatureUnionDf(transformer_list, n_jobs=None, transformer_weights=None, verbose=False, verbose_feature_names_out=False)

Wrapper of FeatureUnion but returning a Dataframe, the column order follows the concatenation done by FeatureUnion

Parameters:

transformer_list – list of (string, transformer) tuples List of transformer objects to be applied to the data. The first half of each tuple is the name of the transformer.
n_jobs – int, default=None Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.
transformer_weights – dict, default=None Multiplicative weights for features per transformer. Keys are transformer names, values the weights. Raises ValueError if key not present in transformer_list.
verbose – bool, default=False If True, the time elapsed while fitting each transformer will be printed as it is completed.

fit(X, y=None)

Method to fit all the transformers.

It also reset the columns attribute.

Parameters:

X – pandas DataFrame of shape (n_samples, n_features) The training input samples.
y – array-like of shape (n_samples,) or (n_samples, n_outputs). The target values (class labels) as integers or strings.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict: Parameter names mapped to their values.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → FeatureUnionDf

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns

selfobject: The updated object.

transform(X, y=None)

Method to call all the transform methods in the transformer_list

It populates the columns attribute with the columns of the output data

Parameters:

X – pandas DataFrame of shape (n_samples, n_features) The input samples.
y – array-like of shape (n_samples,) or (n_samples, n_outputs), Not used The target values (class labels) as integers or strings.

Returns:

pandas DataFrame with all the transformation applied in the order provided