加速和优化GA算法作为回归问题的特征选择

Accelerate and Optimze GA algorithm as feature selection for Regression Problem

提问人:medo0070 提问时间:11/8/2023 更新时间:11/8/2023 访问量:15

问:

我正在尝试将方差分析与 GA 算法进行比较作为特征选择,然后将所选集应用于各种 ML 模型,并根据 MAE、RMSE 和 R2 进行比较。我使用 GA 算法对回归问题进行特征选择。我的数据集包含(78 个要素,1 个目标,1016 行)。我面临 3 个问题:

  1. 该程序需要很长时间才能处理一代 GA。
  2. 由于我是 GA 的新手,我不确定我使用的健身功能是否良好。
  3. 当在 MAE、RMSE 和 R2 方面与用于 ML 模型的方差分析进行比较时,它得到的结果比 GA 结果更差。

对我的上述相关问题的健身功能有什么建议吗? 提前致谢。

以下是我的代码的一部分:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import power_transform
from sklearn import metrics
from sklearn.linear_model import LinearRegression
from sklearn.svm import LinearSVR
from sklearn.ensemble import RandomForestRegressor, ExtraTreesRegressor, AdaBoostRegressor, BaggingRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import StackingRegressor
from deap import base, creator, tools, algorithms
import random
import warnings
import os
import tensorflow as tf
from multiprocessing import Pool


# Initialize TPU
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + 
os.environ['COLAB_TPU_ADDR'])
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.TPUStrategy(resolver)

warnings.filterwarnings('ignore')

# Load the dataset
df = pd.read_csv("CBECS_Office_Subset.csv")
original_feature_names = df.columns[:-1]  # Exclude the target variable

# Normalize the data
scaler = MinMaxScaler()

# Transform the data and ignore warnings during this process
with warnings.catch_warnings():
warnings.filterwarnings("ignore")
df = power_transform(df, method='yeo-johnson')
df = scaler.fit_transform(df)

X = df[:, :-1]
y = df[:, -1]

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Models
models = {
   "Linear Regression": LinearRegression(),
   "Support Vector Machine": LinearSVR(),
   "Random Forest": RandomForestRegressor(),
   "Extra Trees Regressor": ExtraTreesRegressor(),
   "Adaboost Regressor": AdaBoostRegressor(),
   "MLP Regressor": MLPRegressor(),
   "Bagging Regressor": BaggingRegressor(),
   "Stacking Regressor": StackingRegressor(estimators=[
         ('lr', LinearRegression()),
         ('svm', LinearSVR()),
         ('rf', RandomForestRegressor(n_estimators=100, random_state=42)),
         ('etr', ExtraTreesRegressor(n_estimators=100, random_state=42)),
         ('ada', AdaBoostRegressor(n_estimators=100, random_state=42)),
         ('mlp', MLPRegressor()),
   ], final_estimator=LinearRegression())
   }

# Define the GA optimization function
# Create a fitness function that maximizes an aggregated error metric
creator.create("FitnessMin", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMin)

# Define genetic operators
toolbox = base.Toolbox()
toolbox.register("attr_bool", random.randint, 0, 1)  # Binary representation for feature selection
toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_bool, n=len(X[0]))
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

# Define the evaluation function (fitness function)
def evaluate_individual(individual, model):
selected_features = [i for i, bit in enumerate(individual) if bit]

X_train_subset = X_train[:, selected_features]
X_test_subset = X_test[:, selected_features]

model.fit(X_train_subset, y_train)
y_pred = model.predict(X_test_subset)

mae = np.mean(np.abs(y_test - y_pred))
rmse = np.sqrt(np.mean((y_test - y_pred) ** 2))
r2 = 1.0 - (np.sum((y_test - y_pred) ** 2) / np.sum((y_test - np.mean(y_test)) ** 2))

mape = np.mean(np.abs((y_test - y_pred) / y_test)) * 100

# Define an aggregation method, 
fitness = mae

return fitness,
python 机器学习 回归遗传 算法 特征选择

评论


答: 暂无答案