提问人:user17230397 提问时间:5/14/2023 最后编辑:user17230397 更新时间:5/14/2023 访问量:83
ValueError:将数据切片到训练和验证时,0 不在范围内
ValueError: 0 is not in range when slicing data into train and validate
问:
这是从磁盘加载的数据
wt_emb = torch.load("train/train_wt.pt")
mut_emb = torch.load("train/train_mut.pt")
df = pd.read_csv("train/train.csv")
这就是我对数据进行切片的方式
train_wt_emb = wt_emb[int(size*0.2):]
train_mut_emb = mut_emb[int(size*0.2):]
train_df = df[int(size*0.2):]
valid_wt_emb = wt_emb[:int(size*0.2)]
valid_mut_emb = mut_emb[:int(size*0.2)]
valid_df = df[:int(size*0.2)]
这是生成数据集的类
class EmbeddingDataset(torch.utils.data.Dataset):
def __init__(self,mut_pt, wt_pt, data_df):
self.pt_mut = mut_pt
self.pt_wt = wt_pt
self.df = data_df
def __len__(self):
return self.pt_mut.shape[0]
def __getitem__(self, index):
o1=self.pt_mut[index,:]
o2=self.pt_wt[index,:]
if "ddg" in self.df:
df_out=torch.Tensor([self.df["ddg"][index]])
else:
df_out=torch.Tensor([self.df["ID"][index]])
return self.pt_mut[index,:],self.pt_wt[index,:],df_out
这将创建训练/验证数据集和数据加载器
# creating training dataset and dataloader
train_dataset = EmbeddingDataset(train_wt_emb, train_mut_emb, train_df)
# preparing a dataloader for the training
train_dataloader = torch.utils.data.dataloader.DataLoader(
train_dataset,
batch_size=32,
shuffle=False,
num_workers=2,
)
# creating training dataset and dataloader
valid_dataset = EmbeddingDataset(valid_wt_emb, valid_mut_emb, valid_df)
# preparing a dataloader for the training
valid_dataloader = torch.utils.data.dataloader.DataLoader(
valid_dataset,
batch_size=32,
shuffle=False,
num_workers=2,
)
这是我得到错误的地方
for i in range(1):
epoch_loss = 0
for batch_idx, (data_mut,data_wt , target) in tqdm(enumerate(train_dataloader)):
# extract input from datallader
x1 = data_wt.to(device)
x2 = data_mut.to(device)
y = target.to(device)
# make prediction
y_pred = model(x1,x2)
# calculate loss and run optimizer
loss = torch.sqrt(criterion(y, y_pred))
loss.backward()
optimizer.step()
epoch_loss += loss
print(batch_idx[])
print("epoch_",i," = ", epoch_loss/len(train_dataloader))
错误是:
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/range.py", line 391, in get_loc
return self._range.index(new_key)
ValueError: 0 is not in range
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "<ipython-input-31-dbf63f9f6ed2>", line 15, in __getitem__
df_out=torch.Tensor([self.df["ddg"][index]])
File "/usr/local/lib/python3.10/dist-packages/pandas/core/series.py", line 981, in __getitem__
return self._get_value(key)
File "/usr/local/lib/python3.10/dist-packages/pandas/core/series.py", line 1089, in _get_value
loc = self.index.get_loc(label)
File "/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/range.py", line 393, in get_loc
raise KeyError(key) from err
KeyError: 0
当我训练没有切片时,它运行良好,切片的大小似乎是正确的,所以我认为数据加载器是导致此错误的原因。
答: 暂无答案
评论