提问人:Edvin Simic 提问时间:11/8/2023 最后编辑:David MakogonEdvin Simic 更新时间:11/9/2023 访问量:24
机器学习机场空侧预测 KPI
Machine learning airport airside prediction KPI
问:
我有一个代表机场空侧的数据集、23 个自变量和 1 个目标值:受管制飞机的近期(延迟超过 15 分钟)。我使用了帖子中描述的代码,该代码使用不同的回归模型(随机森林、线性回归、GradientBoostingRegressor、XGB 回归),我有这个预测。我能不能让它变得更好。
这些是我的预测测试。
MSE R2 10% error test
LR 0.39 0.42 27.65
RF 0.13 0.81 49.16
SVR 0.19 0.72 42.46
GB 0.19 0.72 44.13
GBX 0.14 0.79 51.12
我认为问题出在我的数据上,有些特征是多共线的,大多数数值特征没有缩放。我有 1788 行(78 个机场,为期 2 年(24 个月)),但一个机场在 24 个月内的大部分数据都是相同的(例如跑道数量、停机位置数量等)我的数据示例是:
• 插槽协调级别 - 2
• 停车位数量 176
• 总离场数 (TAD) - 10000
• 终端容量 (MILions) -14
• 全球年流动产能(动量/年)- 210000
• 跑道配置数量-2
• 跑道数量-2
用 0 和 1 表示的跑道配置类型:
·RC十字架 0
·RC 交叉平行 0
·RC 并联 <=1000 0
·RC 并联>1000 0
·RC Single 0 (如果机场有一条跑道)
·RC V 编队 0
有多少个 ILS 系统
·仪表着陆系统(ILS) CATI 0
·仪表着陆系统 (ILS) CATII 2(2 条跑道,配备 2 个 ILS 系统
·仪表着陆系统(ILS) CATIII 0
·进近间隔(海里)- 5
·每小时跑道容量 (a/h)- 35
·Season_Summer (SS)- 0 (4 个月是冬季)
·Season_Winter (SW)-1
·周转飞机数量(重型) 430
·周转飞机数量(重型)
·周转飞机数量(重型)
目标值以 precent 为单位(例如 23%,45%),我将其转换为数字。
我的代码是这样的:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import xgboost as xgb
# Load data from Excel
data = pd.read_excel("PDPSAMO1.xlsx")
# Drop rows with NaN values
data = data.dropna()
# Separate features and target variable
X = data.drop(columns=['PDP'])
y = data['PDP']
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Standardize features by removing the mean and scaling to unit variance
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Define custom evaluation function for 10% error margin
def within_10_percent_error(y_true, y_pred):
error_margin = 0.10
correct_predictions = sum(abs(y_true - y_pred) / y_true <= error_margin)
total_predictions = len(y_true)
return (correct_predictions / total_predictions) * 100
# Models to evaluate
models = {
'Linear Regression': LinearRegression(),
'Random Forest': RandomForestRegressor(),
'SVR': SVR(),
'Gradient Boosting': GradientBoostingRegressor(),
'XGBoost': xgb.XGBRegressor()
}
# Dictionary to store the best models and their respective scores
best_models = {}
scores = {
'MSE': mean_squared_error,
'R2': r2_score,
'Within 10% Error': within_10_percent_error
}
# Grid search for hyperparameter tuning and model selection
for model_name, model in models.items():
print(f"Training {model_name}...")
param_grid = {} # Define hyperparameters grid for each model if needed
grid_search = GridSearchCV(estimator=model, param_grid=param_grid,
scoring='neg_mean_squared_error', cv=5)
grid_search.fit(X_train_scaled, y_train)
best_model = grid_search.best_estimator_
best_models[model_name] = best_model
# Evaluate model performance on test set
y_pred = best_model.predict(X_test_scaled)
print(f"Model: {model_name}")
for metric_name, metric_func in scores.items():
if metric_name == 'Within 10% Error':
score = metric_func(y_test, y_pred)
else:
score = metric_func(y_test, y_pred)
print(f"{metric_name}: {score:.2f}")
print("-------------------")
# Compare models and print the best model
best_accuracy = 0
best_model_name = ''
for model_name, model in best_models.items():
y_pred = model.predict(X_test_scaled)
accuracy = within_10_percent_error(y_test, y_pred)
if accuracy > best_accuracy:
best_accuracy = accuracy
best_model_name = model_name
print(f"Best Model: {best_model_name} with {best_accuracy:.2f}% Accuracy Within 10% Error Margin")
# Compare models and print the best model
best_accuracy = 0
best_model_name = ''
for model_name, model in best_models.items():
y_pred = model.predict(X_test_scaled)
accuracy = within_10_percent_error(y_test, y_pred) # Corrected function name
if accuracy > best_accuracy:
best_accuracy = accuracy
best_model_name = model_name
print(f"Best Model: {best_model_name} with {best_accuracy:.2f}% Accuracy Within 10% Error Margin")
答: 暂无答案
评论