Utility classes

In this section, we find classes that can support the rest of your pipeline. In particular, with these classes you can

  • select part of the data based on their data type

  • unified multiple pipelines into one transformer

class tubesml.utility.DtypeSel(dtype='numeric')

This transformer selects either numerical or categorical features. In this way we can build separate pipelines for separate data types.

Parameters:

dtype – str, the type of data to select, default=’numeric’. Allowed values: ‘numeric’, ‘category’

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') DtypeSel

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

transform(X, y=None)

Method to select columns based on their type.

It populates the columns attribute with the columns of the output data

Parameters:
  • X – pandas DataFrame of shape (n_samples, n_features) The input samples.

  • y – array-like of shape (n_samples,) or (n_samples, n_outputs), Not used The target values (class labels) as integers or strings.

Returns:

pandas DataFrame with columns of the selected type

class tubesml.utility.FeatureUnionDf(transformer_list, n_jobs=None, transformer_weights=None, verbose=False, verbose_feature_names_out=False)

Wrapper of FeatureUnion but returning a Dataframe, the column order follows the concatenation done by FeatureUnion.

It is now returning pandas dtypes.

Parameters:
  • transformer_list – list of (string, transformer) tuples List of transformer objects to be applied to the data. The first half of each tuple is the name of the transformer.

  • n_jobs – int, default=None Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

  • transformer_weights – dict, default=None Multiplicative weights for features per transformer. Keys are transformer names, values the weights. Raises ValueError if key not present in transformer_list.

  • verbose – bool, default=False If True, the time elapsed while fitting each transformer will be printed as it is completed.

fit(X, y=None)

Method to fit all the transformers.

It also reset the columns attribute.

Parameters:
  • X – pandas DataFrame of shape (n_samples, n_features) The training input samples.

  • y – array-like of shape (n_samples,) or (n_samples, n_outputs). The target values (class labels) as integers or strings.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict

Parameter names mapped to their values.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') FeatureUnionDf

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

transform(X, y=None)

Method to call all the transform methods in the transformer_list

It populates the columns attribute with the columns of the output data

Parameters:
  • X – pandas DataFrame of shape (n_samples, n_features) The input samples.

  • y – array-like of shape (n_samples,) or (n_samples, n_outputs), Not used The target values (class labels) as integers or strings.

Returns:

pandas DataFrame with all the transformation applied in the order provided