ValueError:将数据切片到训练和验证时,0 不在范围内

ValueError: 0 is not in range when slicing data into train and validate

提问人:user17230397 提问时间:5/14/2023 最后编辑:user17230397 更新时间:5/14/2023 访问量:83

问:

这是从磁盘加载的数据

wt_emb = torch.load("train/train_wt.pt")
mut_emb = torch.load("train/train_mut.pt")
df = pd.read_csv("train/train.csv")

这就是我对数据进行切片的方式

train_wt_emb = wt_emb[int(size*0.2):]
train_mut_emb = mut_emb[int(size*0.2):]
train_df = df[int(size*0.2):] 

valid_wt_emb = wt_emb[:int(size*0.2)]
valid_mut_emb = mut_emb[:int(size*0.2)]
valid_df = df[:int(size*0.2)]

这是生成数据集的类

class EmbeddingDataset(torch.utils.data.Dataset):
  def __init__(self,mut_pt, wt_pt, data_df):
    self.pt_mut = mut_pt
    self.pt_wt = wt_pt
    self.df = data_df
  
  def __len__(self):
      return self.pt_mut.shape[0]

  def __getitem__(self, index):
    o1=self.pt_mut[index,:]
    o2=self.pt_wt[index,:]
    if "ddg" in self.df:
      df_out=torch.Tensor([self.df["ddg"][index]])
    else:
      df_out=torch.Tensor([self.df["ID"][index]])
    return  self.pt_mut[index,:],self.pt_wt[index,:],df_out 

这将创建训练/验证数据集和数据加载器

# creating training dataset and dataloader
train_dataset = EmbeddingDataset(train_wt_emb, train_mut_emb, train_df)
# preparing a dataloader for the training
train_dataloader = torch.utils.data.dataloader.DataLoader(
        train_dataset,
        batch_size=32,
        shuffle=False,
        num_workers=2,
    )

# creating training dataset and dataloader
valid_dataset = EmbeddingDataset(valid_wt_emb, valid_mut_emb, valid_df)
# preparing a dataloader for the training
valid_dataloader = torch.utils.data.dataloader.DataLoader(
        valid_dataset,
        batch_size=32,
        shuffle=False,
        num_workers=2,
    )

这是我得到错误的地方

for i in range(1):
  epoch_loss = 0
  for batch_idx, (data_mut,data_wt , target) in tqdm(enumerate(train_dataloader)):
      # extract input from datallader
      x1 = data_wt.to(device)
      x2 = data_mut.to(device)
      y = target.to(device)
      # make prediction
      y_pred = model(x1,x2)
      # calculate loss and run optimizer
      loss = torch.sqrt(criterion(y, y_pred))
      loss.backward()
      optimizer.step()
      epoch_loss += loss
      print(batch_idx[])
  print("epoch_",i," = ", epoch_loss/len(train_dataloader))

错误是:

KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/range.py", line 391, in get_loc
    return self._range.index(new_key)
ValueError: 0 is not in range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "<ipython-input-31-dbf63f9f6ed2>", line 15, in __getitem__
    df_out=torch.Tensor([self.df["ddg"][index]])
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/series.py", line 981, in __getitem__
    return self._get_value(key)
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/series.py", line 1089, in _get_value
    loc = self.index.get_loc(label)
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/range.py", line 393, in get_loc
    raise KeyError(key) from err
KeyError: 0

当我训练没有切片时,它运行良好,切片的大小似乎是正确的,所以我认为数据加载器是导致此错误的原因。

Pandas DataFrame 深度学习 PyTorch 切片

评论


答: 暂无答案