为什么“torch.profiler”在与 ncu 共同运行时没有捕获 cuda 操作

Why `torch.profiler` catches no cuda operation when co-running with ncu

提问人:rd142857 提问时间:11/17/2023 最后编辑:talonmiesrd142857 更新时间:11/17/2023 访问量:28

问:

我已将我的模型和输入绑定到 cuda

x = torch.randint(low=0, high=256, size=(1, 3, 224, 224), dtype=torch.float32).to(device="cuda:0")
model = torchvision.models.googlenet().eval()
inputs = (x,)
model = model.to(device="cuda:0").eval()

我用来分析模型torch.profiler

    with torch.profiler.profile(
        on_trace_ready=trace_handler,
        activities=[
            torch.profiler.ProfilerActivity.CPU,
            torch.profiler.ProfilerActivity.CUDA,
        ],
        with_stack=True,
    ) as p:
        with torch.no_grad():
            for _ in range(warm_ups):
                model(*inputs) # don't record time
                p.step()
            for i in range(iterations):
                y = model(*inputs)
                p.step()

然后我使用命令对上面的文件进行分析,其中包含 ,只需调用它即可。ncutorch.profilertmp.py

ncu -o <output> python tmp.py

但是当我检查导出的配置文件报告时,我发现所有跟踪的操作都是 ,例如cpu_op

  {
    "ph": "X", "cat": "cpu_op", "name": "aten::conv2d", "pid": 9832, "tid": 9832,
    "ts": 1700192834976494, "dur": 1091759,
    "args": {
      "External id": 1,"Ev Idx": 0
    }
  },

奇怪的是,如果我只是单独运行,我可以得到正确的内核函数,比如tmp.py

  },
  {
    "ph": "X", "cat": "kernel", "name": "void cask_cudnn::computeOffsetsKernel<false, false>(cask_cudnn::ComputeOffsetsParams)", "pid": 0, "tid": 7,
    "ts": 1700193716776597, "dur": 3,
    "args": {
      "External id": 2040,
      "queued": 0, "device": 0, "context": 1,
      "stream": 7, "correlation": 2040,
      "registers per thread": 16,
      "shared memory": 0,
      "blocks per SM": 1.0416666,
      "warps per SM": 8.333333,
      "grid": [50, 1, 1],
      "block": [256, 1, 1],
      "est. achieved occupancy %": 26
    }
  },

为什么?

pytorch cuda 配置文件 nsight-compute

评论

1赞 Robert Crovella 11/18/2023
我不熟悉火炬分析器,但通常不可能让两个单独的工具同时使用 CUDA 分析库。为了让 torch 分析器获取流编号等信息,它必须使用 CUDA 分析库。

答: 暂无答案