提问人:Rumblerock 提问时间:11/6/2023 最后编辑:Rumblerock 更新时间:11/8/2023 访问量:54
Tensorflow:如何修复 ResourceExhaustedError?
Tensorflow: How to fix ResourceExhaustedError?
问:
我正在尝试重新创建这些:拥抱脸:问答任务和拥抱脸:问答NLP课程。
我在model.fit()部分遇到了这个ResourceExhaustedError。
---------------------------------------------------------------------------
ResourceExhaustedError Traceback (most recent call last)
Cell In[14], line 1
----> 1 model.fit(x=tf_train_set, batch_size=16, validation_data=tf_validation_set, epochs=3, callbacks=[callback])
ResourceExhaustedError: Graph execution error:
Detected at node 'tf_distil_bert_for_question_answering/distilbert/transformer/layer_._4/attention/dropout_14/dropout/random_uniform/RandomUniform' defined at (most recent call last):
*这里列出了一堆文件*
Node: 'tf_distil_bert_for_question_answering/distilbert/transformer/layer_._4/attention/dropout_14/dropout/random_uniform/RandomUniform'
OOM when allocating tensor with shape[16,12,384,384] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node tf_distil_bert_for_question_answering/distilbert/transformer/layer_._4/attention/dropout_14/dropout/random_uniform/RandomUniform}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[Op:__inference_train_function_9297]
我已经试过降低batch_size。model.fit(x=tf_train_set, batch_size=16, validation_data=tf_validation_set, epochs=3, callbacks=[callback])
我还尝试了限制 GPU 的内存增长 限制 GPU 内存增长
以下是 colab 笔记本:Colab:问答任务和 Colab:问答 NLP 课程
答:
0赞
Signo
11/6/2023
#1
这意味着 GPU 的内存无法承受您的批处理大小或输入数据大小。 因此,请尝试减小批处理大小或输入数据大小
评论
0赞
Rumblerock
11/6/2023
我将batch_size减少到 16、4,甚至到 1。错误仍然发生。
0赞
Rumblerock
11/8/2023
#2
我在开头添加了这些行
import os
os.environ["TF_GPU_ALLOCATOR"]="cuda_malloc_async"
注意:限制 GPU 的内存增长和设置batch_size不是必需的。
评论