使用 to_categorical 应用一个热编码后数据形状更改

data shape changes after applying one hot encoding using to_categorical

提问人:Biche 提问时间:3/9/2023 最后编辑:JosefZBiche 更新时间:3/9/2023 访问量:37

问:

我定义了一个函数get_data从具有特定数据大小的最小数据集中随机选择两位数字。然后应用to_categorical进行一次热编码。但是每次运行函数时,数据形状都会发生变化。不明白为什么。我假设形状应该是 和 ,因为它们只是两个类,但它给了我不同的值。之间 请给出详细的解释,因为我是机器学习的新手。(100, 2)(20, 2)

def get_data(train_size, test_size):
    # Load the MNIST dataset
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    # Generate two random digits between 0 and 9
    generate_digits = np.random.choice(np.arange(10), size=2, replace=False)

    # Get a sub dataset only with the generated digits
    train_digits = np.isin(y_train, generate_digits)
    test_digits = np.isin(y_test, generate_digits)
    x_train_sub, y_train_sub = x_train[train_digits], y_train[train_digits]
    x_test_sub, y_test_sub = x_test[test_digits], y_test[test_digits]

    # Split the dataset into train and test
    x_train_sub, x_test_sub, y_train_sub, y_test_sub = train_test_split(
        x_train_sub, y_train_sub, train_size=train_size, test_size=test_size,
        random_state=0, stratify=y_train_sub)

    y_train_sub = keras.utils.to_categorical(y_train_sub)
    y_test_sub = keras.utils.to_categorical(y_test_sub)

    return x_train_sub, y_train_sub, x_test_sub, y_test_sub


train_size = 100
test_size = 20
x_train_sub, y_train_sub, x_test_sub, y_test_sub = get_data(train_size, test_size)


print (y_train_sub.shape)
print(y_test_sub.shape)

一个样本结果

(100, 10)
(20, 10)

另一个示例结果

(100, 6)
(20, 6)

我尝试了很多东西,但没有成功。

编码 形状 MNIST 分类

评论


答: 暂无答案