RuntimeError:预期的 2D(未批处理)或 3D(批处理)输入到 conv1d,但输入大小:[1, 1, 2, 160000]

RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [1, 1, 2, 160000]

提问人:Yunus Koç 提问时间:11/9/2023 最后编辑:tdyYunus Koç 更新时间:11/11/2023 访问量:48

问:

我有一种火炬训练脚本来转录音频文件。我有 2 个音频文件,其中一个有 [1 1 16000]形状,另一个有 [1 1 16000]。两个文件都是 10 秒。我无法处理错误。如果您对音频有所了解,也许会有所帮助。

文件在一个路径中。 在此之前,我解析它们,如 10sec+10sec+10sec+10sec+...+10sec,如果你想看到我可以分享完整的代码。 当我使用“Zorlu.m4a”运行代码时,它正在工作。但是当我使用 bell.m4a 运行代码时,它会给我错误。

"""------------------------------blocks of code--------------------"""
# Define the path where the .wav files are located
wav_directory = os.getcwd()+"/wav"  # Add your wav dir

# Define the output file name
output_file = os.path.join(wav_directory, "metadata.txt")

# Define the range of .wav files (1 to 165)
# Change the 165 to what ever amount of wav files you have if 100 the change it to 101
wav_files_range = range(1, int(len(ses)/parca_uzunlugu+1))

# Initialize the list to store file paths and transcripts
file_and_transcripts = []

# Initialize the wav2vec model and processor
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-large-960h")
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-960h")

# Iterate through the .wav files
for i in wav_files_range:
    wav_file = os.path.join(wav_directory, f"{i}.wav")

    # Check if the .wav file exists
    if os.path.exists(wav_file):
        # Recognize the speech in the .wav file
        try:
            waveform, sample_rate = torchaudio.load(wav_file)
            print("waveform",type(waveform),waveform.shape)
            waveform = waveform.squeeze()  # Squeeze the batch dimension
            print("waveform",type(waveform),waveform.shape)
            resampler = torchaudio.transforms.Resample(
                orig_freq=sample_rate, new_freq=16000)
            print("waveform",type(waveform),waveform.shape)
            waveform = resampler(waveform)
            print("waveform",type(waveform),waveform.shape)
            input_values = processor(
                waveform, return_tensors="pt", sampling_rate=16000).input_values
            print("waveform",type(waveform),waveform.shape)
            logits = model(input_values).logits
            predicted_ids = torch.argmax(logits, dim=-1)
            transcript = processor.decode(predicted_ids[0])
        except FileNotFoundError:
            print(f"File not found: {wav_file}")
            continue

        # Append the desired path format and transcript to the list
        file_and_transcripts.append(
            f"{wav_directory}/{i}.wav|{transcript.capitalize()}.|{transcript.capitalize()}.")
    else:
        print(f"File not found: {wav_file}")

# Write the file paths and transcripts to the output file
with open(output_file, "w") as f:
    for line in file_and_transcripts:
        f.write(f"{line}\n")

print(f"File '{output_file}' created successfully.")
shutil.move(wav_directory+'/metadata.txt',wav_directory[:-4]+'/metadata.txt')


"""------------------------------blocks of code--------------------"""


Erors:
waveform <class 'torch.Tensor'> torch.Size([2, 441000])
waveform <class 'torch.Tensor'> torch.Size([2, 441000])
waveform <class 'torch.Tensor'> torch.Size([2, 441000])
waveform <class 'torch.Tensor'> torch.Size([2, 160000])
waveform <class 'torch.Tensor'> torch.Size([2, 160000])
--------i add a print line on torch/nn/modules/conv.py line 307------- <class 'torch.Tensor'> torch.Size([1, 1, 2, 160000])
Traceback (most recent call last):
  File "/home/yunus/Desktop/piper/datasets/preprocess.py", line 87, in <module>
    logits = model(input_values).logits
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 1962, in forward
    outputs = self.wav2vec2(
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 1547, in forward
    extract_features = self.feature_extractor(input_values)
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 459, in forward
    hidden_states = conv_layer(hidden_states)
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 362, in forward
    hidden_states = self.conv(hidden_states)
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 311, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/yunus/anaconda3/envs/ljpreprocess/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 307, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [1, 1, 2, 160000]```
python 输入 pytorch m4a

评论

0赞 Zephyr 11/11/2023
这回答了你的问题吗?如何在 Python 中将 WAV 从立体声转换为单声道?

答: 暂无答案