机器学习机场空侧预测 KPI-解网

问：

我有一个代表机场空侧的数据集、23 个自变量和 1 个目标值：受管制飞机的近期（延迟超过 15 分钟）。我使用了帖子中描述的代码，该代码使用不同的回归模型（随机森林、线性回归、GradientBoostingRegressor、XGB 回归），我有这个预测。我能不能让它变得更好。

这些是我的预测测试。

     MSE     R2 10% error test
LR   0.39   0.42    27.65
RF   0.13   0.81    49.16
SVR  0.19   0.72    42.46
GB   0.19   0.72    44.13
GBX  0.14   0.79    51.12

我认为问题出在我的数据上，有些特征是多共线的，大多数数值特征没有缩放。我有 1788 行（78 个机场，为期 2 年（24 个月）），但一个机场在 24 个月内的大部分数据都是相同的（例如跑道数量、停机位置数量等）我的数据示例是：

• 插槽协调级别 - 2

• 停车位数量 176

• 总离场数（TAD） - 10000

• 终端容量（MILions） -14

• 全球年流动产能（动量/年）- 210000

• 跑道配置数量-2

• 跑道数量-2

用 0 和 1 表示的跑道配置类型：

·RC十字架 0

·RC 交叉平行 0

·RC 并联 <=1000 0

·RC 并联>1000 0

·RC Single 0 （如果机场有一条跑道）

·RC V 编队 0

有多少个 ILS 系统

·仪表着陆系统（ILS） CATI 0

·仪表着陆系统（ILS） CATII 2（2 条跑道，配备 2 个 ILS 系统

·仪表着陆系统（ILS） CATIII 0

·进近间隔（海里）- 5

·每小时跑道容量（a/h）- 35

·Season_Summer （SS）- 0 （4 个月是冬季）

·Season_Winter （SW）-1

·周转飞机数量（重型） 430

·周转飞机数量（重型）

目标值以 precent 为单位（例如 23%，45%），我将其转换为数字。

我的代码是这样的：

import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import xgboost as xgb

# Load data from Excel
data = pd.read_excel("PDPSAMO1.xlsx")

# Drop rows with NaN values
data = data.dropna()

# Separate features and target variable
X = data.drop(columns=['PDP'])
y = data['PDP']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features by removing the mean and scaling to unit variance
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Define custom evaluation function for 10% error margin
def within_10_percent_error(y_true, y_pred):
    error_margin = 0.10
    correct_predictions = sum(abs(y_true - y_pred) / y_true <= error_margin)
    total_predictions = len(y_true)
    return (correct_predictions / total_predictions) * 100

# Models to evaluate
models = {
    'Linear Regression': LinearRegression(),
    'Random Forest': RandomForestRegressor(),
    'SVR': SVR(),
    'Gradient Boosting': GradientBoostingRegressor(),
    'XGBoost': xgb.XGBRegressor()
}

# Dictionary to store the best models and their respective scores
best_models = {}
scores = {
    'MSE': mean_squared_error,
    'R2': r2_score,
    'Within 10% Error': within_10_percent_error
}

# Grid search for hyperparameter tuning and model selection
for model_name, model in models.items():
    print(f"Training {model_name}...")
    param_grid = {}  # Define hyperparameters grid for each model if needed

    grid_search = GridSearchCV(estimator=model, param_grid=param_grid, 
                               scoring='neg_mean_squared_error', cv=5)
    grid_search.fit(X_train_scaled, y_train)
    
    best_model = grid_search.best_estimator_
    best_models[model_name] = best_model

    # Evaluate model performance on test set
    y_pred = best_model.predict(X_test_scaled)
    print(f"Model: {model_name}")
    for metric_name, metric_func in scores.items():
        if metric_name == 'Within 10% Error':
            score = metric_func(y_test, y_pred)
        else:
            score = metric_func(y_test, y_pred)
        print(f"{metric_name}: {score:.2f}")
    print("-------------------")

# Compare models and print the best model
best_accuracy = 0
best_model_name = ''
for model_name, model in best_models.items():
    y_pred = model.predict(X_test_scaled)
    accuracy = within_10_percent_error(y_test, y_pred)
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        best_model_name = model_name

print(f"Best Model: {best_model_name} with {best_accuracy:.2f}% Accuracy Within 10% Error Margin")

# Compare models and print the best model
best_accuracy = 0
best_model_name = ''
for model_name, model in best_models.items():
    y_pred = model.predict(X_test_scaled)
    accuracy = within_10_percent_error(y_test, y_pred)  # Corrected function name
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        best_model_name = model_name

print(f"Best Model: {best_model_name} with {best_accuracy:.2f}% Accuracy Within 10% Error Margin")

机器学习预测数据预处理

答： 暂无答案

上一个：选择正确的评估指标来预测会话费用 - MAE 还是 RMSE？

下一个：客户流失模型中的目标泄漏

机器学习机场空侧预测 KPI

Machine learning airport airside prediction KPI

评论