提问人:rd142857 提问时间:11/17/2023 最后编辑:talonmiesrd142857 更新时间:11/17/2023 访问量:28
为什么“torch.profiler”在与 ncu 共同运行时没有捕获 cuda 操作
Why `torch.profiler` catches no cuda operation when co-running with ncu
问:
我已将我的模型和输入绑定到 cuda
x = torch.randint(low=0, high=256, size=(1, 3, 224, 224), dtype=torch.float32).to(device="cuda:0")
model = torchvision.models.googlenet().eval()
inputs = (x,)
model = model.to(device="cuda:0").eval()
我用来分析模型torch.profiler
with torch.profiler.profile(
on_trace_ready=trace_handler,
activities=[
torch.profiler.ProfilerActivity.CPU,
torch.profiler.ProfilerActivity.CUDA,
],
with_stack=True,
) as p:
with torch.no_grad():
for _ in range(warm_ups):
model(*inputs) # don't record time
p.step()
for i in range(iterations):
y = model(*inputs)
p.step()
然后我使用命令对上面的文件进行分析,其中包含 ,只需调用它即可。ncu
torch.profiler
tmp.py
ncu -o <output> python tmp.py
但是当我检查导出的配置文件报告时,我发现所有跟踪的操作都是 ,例如cpu_op
{
"ph": "X", "cat": "cpu_op", "name": "aten::conv2d", "pid": 9832, "tid": 9832,
"ts": 1700192834976494, "dur": 1091759,
"args": {
"External id": 1,"Ev Idx": 0
}
},
奇怪的是,如果我只是单独运行,我可以得到正确的内核函数,比如tmp.py
},
{
"ph": "X", "cat": "kernel", "name": "void cask_cudnn::computeOffsetsKernel<false, false>(cask_cudnn::ComputeOffsetsParams)", "pid": 0, "tid": 7,
"ts": 1700193716776597, "dur": 3,
"args": {
"External id": 2040,
"queued": 0, "device": 0, "context": 1,
"stream": 7, "correlation": 2040,
"registers per thread": 16,
"shared memory": 0,
"blocks per SM": 1.0416666,
"warps per SM": 8.333333,
"grid": [50, 1, 1],
"block": [256, 1, 1],
"est. achieved occupancy %": 26
}
},
为什么?
答: 暂无答案
评论