如何使用直方图或表DataFrame作为线性回归中的预测因子？-解网

问：

我正在尝试构建一个模型，可以计算光晕中暗物质粒子的浓度，已经有软件包和方法可以根据称为 NFW 曲线的特定密度分布在径向网格中生成粒子。我把粒子的每个实现都变成一个直方图，我想用这个直方图作为我的预测器，预测器集中的每个直方图对应一个称为浓度参数 c 的响应。这应该是一件容易的事，因为我之前做过类似的工作，但是在这里我遇到了一个问题，集合中的每个直方图都被分成 100 个 bin，但每个直方图的 bin 大小并不相同，例如我有这个直方图对应于 c=5：

    Bin Edges  Histogram Values
0    0.000486                21
1    0.002544                39
2    0.004602                73
3    0.006660                60
4    0.008718                64
..        ...               ...
95   0.195999                83
96   0.198057                64
97   0.200115                63
98   0.202173                74
99   0.204231                70

这个直方图对应于 c=20：

    Bin Edges  Histogram Values
0    0.000085                76
1    0.002147               188
2    0.004209               205
3    0.006271               216
4    0.008333               230
..        ...               ...
95   0.195968                40
96   0.198030                36
97   0.200092                45
98   0.202154                40
99   0.204215                42

为了清楚起见，我将直方图转换为表格，因为您可以看到不同预测变量之间的bin_edges不同，因此我无法将此 2 维数据简化为单个维度。

我的尝试是这样的：

import numpy as np
from halotools import empirical_models
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd


# Define the range for concentration parameter (c) and radius (r)
min_c = 4.0
max_c = 40.0
M = 1E15  # Mpcs
z = 0.0  # Redshift
num_samples = 300  # Number of samples
num_bins = 100  # Number of bins in each histogram


# Generate random concentration values (c) within the specified range
concentration_values = np.random.uniform(min_c, max_c, num_samples)

# Initialize lists to store features (X) and target values (y)
X = []
y = []

# Generate NFW density profiles and extract relevant information
for c in concentration_values:
    # Generate a realization of particles
    nfw_profile = empirical_models.NFWProfile()
    nfw_radial_positions = nfw_profile.mc_generate_nfw_radial_positions(halo_mass=M, conc=c)
    # Make a histrogram from the realization that is divided into 100 bins
    nfw_hist, bin_edges = np.histogram(nfw_radial_positions, num_bins)
    # Create a DataFrame from the histogram values and bin edges
    hist_table = pd.DataFrame({'Bin Edges': bin_edges[:-1], 'Histogram Values': nfw_hist})
    X.append(hist_table)
    y.append(c)


# Split the dataset into training, validation, and test sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42)
model = LinearRegression()
# Train the model on the training data
model.fit(X_train, y_train)

# Predict concentration parameters on the validation set
y_val_pred = model.predict(X_val)

# Evaluate the model's performance on the validation set
mse = mean_squared_error(y_val, y_val_pred)
mae = mean_absolute_error(y_val, y_val_pred)
r2 = r2_score(y_val, y_val_pred)

# Print the evaluation metrics
print("Mean Squared Error (MSE):", mse)
print("Mean Absolute Error (MAE):", mae)
print("R-squared (R^2) Score:", r2)

但是我一直收到这个错误：

ValueError: Found array with dim 3. LinearRegression expected <= 2.

有什么想法吗？

编辑：这是完整的错误

    Traceback (most recent call last):
  File "C:\Users\Raeed\AppData\Local\Programs\Python\Python39\lib\contextlib.py", line 135, in __exit__
    self.gen.throw(type, value, traceback)
  File "C:\Users\Raeed\PycharmProjects\NFW_profile\lib\site-packages\sklearn\_config.py", line 353, in config_context
    yield
  File "C:\Users\Raeed\PycharmProjects\NFW_profile\lib\site-packages\sklearn\base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "C:\Users\Raeed\PycharmProjects\NFW_profile\lib\site-packages\sklearn\linear_model\_base.py", line 678, in fit
    X, y = self._validate_data(
  File "C:\Users\Raeed\PycharmProjects\NFW_profile\lib\site-packages\sklearn\base.py", line 622, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "C:\Users\Raeed\PycharmProjects\NFW_profile\lib\site-packages\sklearn\utils\validation.py", line 1146, in check_X_y
    X = check_array(
  File "C:\Users\Raeed\PycharmProjects\NFW_profile\lib\site-packages\sklearn\utils\validation.py", line 951, in check_array
    raise ValueError(
ValueError: Found array with dim 3. LinearRegression expected <= 2.

python 机器学习 scikit-learn 线性回归

答： 暂无答案

上一个：为什么我在 VS 代码中的 Junypernotebook 拒绝读取我的 csv 文件中的一列数据 [关闭]

下一个：如何在 R 中使用逻辑回归模型绘制线性图？

如何使用直方图或表DataFrame作为线性回归中的预测因子？

How to use Histogram or table DataFrame as a predictor in linear regression?

评论