机器学习机场空侧预测 KPI

Machine learning airport airside prediction KPI

提问人:Edvin Simic 提问时间:11/8/2023 最后编辑:David MakogonEdvin Simic 更新时间:11/9/2023 访问量:24

问:

我有一个代表机场空侧的数据集、23 个自变量和 1 个目标值:受管制飞机的近期(延迟超过 15 分钟)。我使用了帖子中描述的代码,该代码使用不同的回归模型(随机森林、线性回归、GradientBoostingRegressor、XGB 回归),我有这个预测。我能不能让它变得更好。

这些是我的预测测试。

     MSE     R2 10% error test
LR   0.39   0.42    27.65
RF   0.13   0.81    49.16
SVR  0.19   0.72    42.46
GB   0.19   0.72    44.13
GBX  0.14   0.79    51.12

我认为问题出在我的数据上,有些特征是多共线的,大多数数值特征没有缩放。我有 1788 行(78 个机场,为期 2 年(24 个月)),但一个机场在 24 个月内的大部分数据都是相同的(例如跑道数量、停机位置数量等)我的数据示例是:

• 插槽协调级别 - 2

• 停车位数量 176

• 总离场数 (TAD) - 10000

• 终端容量 (MILions) -14

• 全球年流动产能(动量/年)- 210000

• 跑道配置数量-2

• 跑道数量-2

用 0 和 1 表示的跑道配置类型:

·RC十字架 0

·RC 交叉平行 0

·RC 并联 <=1000 0

·RC 并联>1000 0

·RC Single 0 (如果机场有一条跑道)

·RC V 编队 0

有多少个 ILS 系统

·仪表着陆系统(ILS) CATI 0

·仪表着陆系统 (ILS) CATII 2(2 条跑道,配备 2 个 ILS 系统

·仪表着陆系统(ILS) CATIII 0

·进近间隔(海里)- 5

·每小时跑道容量 (a/h)- 35

·Season_Summer (SS)- 0 (4 个月是冬季)

·Season_Winter (SW)-1

·周转飞机数量(重型) 430

·周转飞机数量(重型)

·周转飞机数量(重型)

目标值以 precent 为单位(例如 23%,45%),我将其转换为数字。

我的代码是这样的:

import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import xgboost as xgb

# Load data from Excel
data = pd.read_excel("PDPSAMO1.xlsx")

# Drop rows with NaN values
data = data.dropna()

# Separate features and target variable
X = data.drop(columns=['PDP'])
y = data['PDP']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features by removing the mean and scaling to unit variance
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Define custom evaluation function for 10% error margin
def within_10_percent_error(y_true, y_pred):
    error_margin = 0.10
    correct_predictions = sum(abs(y_true - y_pred) / y_true <= error_margin)
    total_predictions = len(y_true)
    return (correct_predictions / total_predictions) * 100

# Models to evaluate
models = {
    'Linear Regression': LinearRegression(),
    'Random Forest': RandomForestRegressor(),
    'SVR': SVR(),
    'Gradient Boosting': GradientBoostingRegressor(),
    'XGBoost': xgb.XGBRegressor()
}

# Dictionary to store the best models and their respective scores
best_models = {}
scores = {
    'MSE': mean_squared_error,
    'R2': r2_score,
    'Within 10% Error': within_10_percent_error
}

# Grid search for hyperparameter tuning and model selection
for model_name, model in models.items():
    print(f"Training {model_name}...")
    param_grid = {}  # Define hyperparameters grid for each model if needed

    grid_search = GridSearchCV(estimator=model, param_grid=param_grid, 
                               scoring='neg_mean_squared_error', cv=5)
    grid_search.fit(X_train_scaled, y_train)
    
    best_model = grid_search.best_estimator_
    best_models[model_name] = best_model

    # Evaluate model performance on test set
    y_pred = best_model.predict(X_test_scaled)
    print(f"Model: {model_name}")
    for metric_name, metric_func in scores.items():
        if metric_name == 'Within 10% Error':
            score = metric_func(y_test, y_pred)
        else:
            score = metric_func(y_test, y_pred)
        print(f"{metric_name}: {score:.2f}")
    print("-------------------")

# Compare models and print the best model
best_accuracy = 0
best_model_name = ''
for model_name, model in best_models.items():
    y_pred = model.predict(X_test_scaled)
    accuracy = within_10_percent_error(y_test, y_pred)
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        best_model_name = model_name

print(f"Best Model: {best_model_name} with {best_accuracy:.2f}% Accuracy Within 10% Error Margin")

# Compare models and print the best model
best_accuracy = 0
best_model_name = ''
for model_name, model in best_models.items():
    y_pred = model.predict(X_test_scaled)
    accuracy = within_10_percent_error(y_test, y_pred)  # Corrected function name
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        best_model_name = model_name

print(f"Best Model: {best_model_name} with {best_accuracy:.2f}% Accuracy Within 10% Error Margin")
机器学习 预测 数据预处理

评论


答: 暂无答案