Utility classes
In this section, we find classes that can support the rest of your pipeline. In particular, with these classes you can
select part of the data based on their data type
unified multiple pipelines into one transformer
- class tubesml.utility.DtypeSel(dtype='numeric')
This transformer selects either numerical or categorical features. In this way we can build separate pipelines for separate data types.
- Parameters:
dtype – str, the type of data to select, default=’numeric’. Allowed values: ‘numeric’, ‘category’
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') DtypeSel
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
Returns
- selfobject
The updated object.
- transform(X, y=None)
Method to select columns based on their type.
It populates the
columnsattribute with the columns of the output data- Parameters:
X – pandas DataFrame of shape (n_samples, n_features) The input samples.
y – array-like of shape (n_samples,) or (n_samples, n_outputs), Not used The target values (class labels) as integers or strings.
- Returns:
pandas DataFrame with columns of the selected type
- class tubesml.utility.FeatureUnionDf(transformer_list, n_jobs=None, transformer_weights=None, verbose=False, verbose_feature_names_out=False)
Wrapper of FeatureUnion but returning a Dataframe, the column order follows the concatenation done by FeatureUnion.
It is now returning pandas dtypes.
- Parameters:
transformer_list – list of (string, transformer) tuples List of transformer objects to be applied to the data. The first half of each tuple is the name of the transformer.
n_jobs – int, default=None Number of jobs to run in parallel.
Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors.transformer_weights – dict, default=None Multiplicative weights for features per transformer. Keys are transformer names, values the weights. Raises ValueError if key not present in
transformer_list.verbose – bool, default=False If True, the time elapsed while fitting each transformer will be printed as it is completed.
- fit(X, y=None)
Method to fit all the transformers.
It also reset the
columnsattribute.- Parameters:
X – pandas DataFrame of shape (n_samples, n_features) The training input samples.
y – array-like of shape (n_samples,) or (n_samples, n_outputs). The target values (class labels) as integers or strings.
- get_params(deep=True)
Get parameters for this estimator.
Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- paramsdict
Parameter names mapped to their values.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') FeatureUnionDf
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
Returns
- selfobject
The updated object.
- transform(X, y=None)
Method to call all the transform methods in the
transformer_listIt populates the
columnsattribute with the columns of the output data- Parameters:
X – pandas DataFrame of shape (n_samples, n_features) The input samples.
y – array-like of shape (n_samples,) or (n_samples, n_outputs), Not used The target values (class labels) as integers or strings.
- Returns:
pandas DataFrame with all the transformation applied in the order provided