【TensorFlow】InternalError: Failed copying input tensor

⚠ TensorFlow-GPU 执行模型训练时报错:

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

解决方案:『TensorFlow: Dst tensor is not initialized - Stack Overflow』

主要原因在于 batch_size 太大,内存无法负载,将 batch_size 适当调小即可正常运行。

【注】默认情况下,TF 会尽可能地多分配占用 GPU 内存,通过调整 GPUConfig 可以设置为按需分配内存,参考『TensorFlow 文档』『TensorFlow 代码』。 


另外,使用 Jupyter Notebook 进行长期模型训练时,可能由于 GPU 内存无法及时释放导致该报错。参考『此答案』可以解决此问题,定义如下函数:

from keras.backend import set_session
from keras.backend import clear_session
from keras.backend import get_session
import gc

# Reset Keras Session
def reset_keras():
    sess = get_session()
    clear_session()
    sess.close()
    sess = get_session()

    try:
        del classifier # this is from global space - change this as you need
    except:
        pass

    print(gc.collect()) # if it does something you should see a number as output

    # use the same config as you used to create the session
    config = tf.compat.v1.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 1
    config.gpu_options.visible_device_list = "0"
    set_session(tf.compat.v1.Session(config=config))

需要清除 GPU 内存时,直接调用 reset_keras 函数即可。例如:

dense_layers = [0, 1, 2]
layer_sizes = [32, 64, 128]
conv_layers = [1, 2, 3]

for dense_layer in dense_layers:
    for layer_size in layer_sizes:
        for conv_layer in conv_layers:
            reset_keras()
            # training your model here
posted @ 2021-11-06 16:50  harman-chen  阅读(9474)  评论(1编辑  收藏  举报