tf.data.experimental.CsvDataset 忽略我的特征规范 (tensorflow)

tf.data.experimental.CsvDataset ignore my feature specification (tensorflow)

提问人:Hack-R 提问时间:3/20/2023 更新时间:3/20/2023 访问量:28

问:

tf.data.experimental.CsvDataset似乎忽略了我的功能规范。我在使用从 SchemaGen 架构以编程方式生成的record_default时遇到了同样的问题。数据是合成的,以消除数据作为潜在的错误来源。

我生成的数据如下:

# Generate the dataset
data = []
for i in range(100):
    row = {}
    row['label'] = random.randint(1, 10)
    row['age'] = random.randint(50, 100)
    row['Location'] = random.choice(location_list) # string categorical feature
    row['text'] = ''.join(random.choices(string.ascii_letters + string.digits, k=10)).encode()
    data.append(row)

# Write the dataset to a CSV file
# Create the directory if it doesn't exist
if not os.path.exists('temp123'):
    os.mkdir('temp123')
else:
    # Remove existing files if the directory already exists
    for filename in os.listdir('temp123'):
        file_path = os.path.join('temp123', filename)
        try:
            if os.path.isfile(file_path) or os.path.islink(file_path):
                os.unlink(file_path)
        except Exception as e:
            print(f'Failed to delete {file_path}. Reason: {e}')

with open('temp123/data.csv', 'w') as f:
    f.write('label,age,Location,text\n')
    for row in data:
        f.write(f"{row['label']},{row['age']},{row['Location']},{row['text'].decode()}\n")

我像这样加载数据:

# Define the feature specification for the CSV file
feature_types = {
    'label': tf.int32,
    'age': tf.int32,
    'Location': tf.string,
    'text': tf.string
}

# Load the CSV file  
train_dataset = tf.data.experimental.CsvDataset(
    ['temp123/data.csv'],
    feature_types,
    header=True
)

无论如何,一切都被加载为字符串:

<CsvDatasetV2 element_spec=(
TensorSpec(shape=(), dtype=tf.string, name=None), 
TensorSpec(shape=(), dtype=tf.string, name=None), 
TensorSpec(shape=(), dtype=tf.string, name=None), 
TensorSpec(shape=(), dtype=tf.string, name=None))>
Python TensorFlow

评论


答: 暂无答案