Python uses the DropFeatures, RecursiveFeatureElimination, and SelectByShuffling functions of Feature engine for Feature selection
Preparation work:
1. Ensure that the Python environment has been installed.
2. Install the Feature engine library by using the following command:
pip install feature-engine
3. Import the required classes and functions:
python
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from feature_engine.selection import DropFeatures, RecursiveFeatureElimination, SelectByShuffling
Dependent class libraries:
-Feature engineering: provides some powerful Feature engineering methods, including Feature selection, transformation, Discretization, etc.
Data sample:
In this example, we will use the Boston housing price dataset from Skylearn as an example data. This dataset includes 13 characteristic variables for predicting housing prices.
python
#Load Dataset
boston = load_boston()
#Obtain features and target variables
X = boston.data
y = boston.target
#Divide training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Full code instance:
python
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from feature_engine.selection import DropFeatures, RecursiveFeatureElimination, SelectByShuffling
#Load Dataset
boston = load_boston()
#Obtain features and target variables
X = boston.data
y = boston.target
#Divide training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#Using the DropFeatures class to delete features
drop_feats = DropFeatures(features_to_drop=['CRIM', 'ZN', 'INDUS'])
X_train_drop = drop_feats.fit_transform(X_train)
#Recursive feature elimination using the RecursiveFeatureElimination class
rfe = RecursiveFeatureElimination(estimator=LinearRegression(), n_features_to_select=5)
X_train_rfe = rfe.fit_transform(X_train, y_train)
#Use the SelectByShuffling class for Feature selection
sel_shuffle = SelectByShuffling(estimator=LinearRegression(), scoring='r2', cv=10, random_state=42)
X_train_shuffle = sel_shuffle.fit_transform(X_train, y_train)
#Print Selected Features
print("Selected Features using DropFeatures:", drop_feats.features_to_drop_)
print("Selected Features using RecursiveFeatureElimination:", X_train.columns[rfe.support_])
print("Selected Features using SelectByShuffling:", sel_shuffle.features_to_drop_)
#Model training and prediction
lr = LinearRegression()
lr.fit(X_train_drop, y_train)
y_pred = lr.predict(drop_feats.transform(X_test))
print("R-Squared (DropFeatures):", lr.score(drop_feats.transform(X_test), y_test))
lr.fit(X_train_rfe, y_train)
y_pred = lr.predict(rfe.transform(X_test))
print("R-Squared (RecursiveFeatureElimination):", lr.score(rfe.transform(X_test), y_test))
lr.fit(X_train_shuffle, y_train)
y_pred = lr.predict(sel_shuffle.transform(X_test))
print("R-Squared (SelectByShuffling):", lr.score(sel_shuffle.transform(X_test), y_test))
Summary:
In this article, we use the DropFeatures, RecursiveFeatureElimination, and SelectByShuffling functions in the Feature engine library to implement Feature selection. The DropFeatures function is used to delete specific features, the RecursiveFeatureElimination function selects the best feature subset through recursive feature elimination, and the SelectByShuffling function selects features through Card stacking. Through experiments, we can select appropriate feature subsets from a given dataset to improve the performance and accuracy of the model.