二、尝试过的失败方案

对于PyTorch而言，使用多GPU训练方案也许是一种可行的方案。
net = torch.nn.DataParallel(model, device_ids=[xxx])
但是由于也涉及到tf的model，因此这样并不能解决问题，运行起来依旧会报CUDA oom。 
三、解决方案
 
 1.调小batch_size 
可以尝试将batch_size直接降到1，如果还是不行就只能考虑第二种方案。 
 2.手动指定不同model放到不同的GPU上 
对于tf： 
with tf.device('/gpu:2'):
    com, rec = ComCNN(), RecCNN()
    com.summary()
    rec.summary()
对于PyTorch： 
    model = model.to(torch.device('cuda: X'))
    model = model.cuda(X) # X代表GPU编号
值得注意的是：不同的tensor之间如果需要进行运算的话，需要保证在同一个gpu上，将tensor指定到具体的gpu上： 
    data = data.cuda(X) # X代表GPU编号
1.torch.device的作用
 2.Torch之CUDA使用指定显卡
                    一、简述场景简述：文件需要加载多个不同的模型同时运行从而导致cuda OOM，这些模型有的是用tf代码写的，有的是用PyTorch写的。二、尝试过的失败方案对于PyTorch而言，使用多GPU训练方案也许是一种可行的方案。net = torch.nn.DataParallel(model, device_ids=[xxx])但是由于也涉及到tf的model，因此这样并不能解决问题，运行起来依旧会报CUDA oom。三、解决方案1.调小batch_size可以尝试将batch_size直
https://developer.nvidia.com/cuda-gpus
选则自己显卡类型对应的NVIDIA系列，可以得到显卡的计算能力（compute capability）
注意标注Notebook的为笔记本电脑
二、软件要求
1. 了解tensorflow各个版本需要的CUDA版本以及Cudnn的对应关系
注意记下自己需要的tensorflow版本、CUDA版本、Cudnn版本
https://www.tensorflow.org/install/source#co
				Pytorch 训练时有时候会因为加载的东西过多而爆显存，有些时候这种情况还可以使用cuda的清理技术进行修整，当然如果模型实在太大，那也没办法。
使用torch.cuda.empty_cache()删除一些不需要的变量代码示例如下：
    output = model(input)
except RuntimeError as exception:
    if "out of...
				RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 768.50 MiB already allocated; 16.39 MiB free; 9.50 MiB cached)解决方法
使用GPU出现上述错误，尝试改小batch_size的大小
主要是记录一下自己平时遇到的问题，和大家分享一下
如有侵犯，请联系我
点个赞支持一下吧
				服务器上跑程序，出现这种错误，大概是因为程序太大内存不够，有以下两种解决方式：
1、batch_size设置太大，改小一点
2、cuda节点被占用，可以指定其他节点，一种方式：程序里面写device = torch.device(‘cuda:0’ if torch.cuda.is_available() else ‘cpu’)，服务器里面写CUDA_VISIBLE_DEVICES=X；另一种方式：程序里面直接写os.environ[“CUDA_VISIBLE_DEVICES”] = “2,3”
显示free的内存足够，但是仍然报CUDA out of memory错误。
如（仅举例）：RuntimeError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 10.73 GiB total capacity; 9.55 GiB already allocated; 199 MiB free; 19.44 MiB cached)
RuntimeError: cuDNN error: CUDNN_STATUS_I
				RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 3.94 GiB total capacity; 3.36 GiB already allocated; 13.06 MiB free; 78.58 MiB cached)
减小batch_size, 致敬这位老哥
				RuntimeError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 10.92 GiB total capacity; 9.65 GiB already allocated; 29.00 MiB free; 10.37 GiB reserved in total by PyTorch)
GPU跑模型报错
RuntimeError: CUDA out of memory. Tried to allocate 38.00 MiB (GPU 
				在加载训练好的模型进行测试时，遇到了RuntimeError: Cuda error: out of memory。
觉得很诧异，明明模型也不是很大，怎么就爆显存了。
后面去pytorch 论坛上找到了答案，原来是加载模型的时候需要先通过torch.load()的map_location参数加载到cpu上，然后才放到gpu上。
def load_model(model, model_save_path, use_state_dict=True):
    print(f"[i] load model fro
在工程上同时使用了Tensorflow框架模型与pytorch框架模型。全部模型大小加起来1个G左右，显存11G，但是运行时报错：
CUDA out of memory
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 10.91 GiB total capacity; 856.79 MiB already allocated; 17.38 MiB free; 71.21 MiB cached)
RuntimeError: CUDA out of memory. Tried to allocate 36.00 MiB (GPU 0; 2.00 GiB total capacity; 859.04 MiB already allocated; 6.46 MiB free; 904.00 MiB reserved in total by PyTorch)
解决方案：
win+r，输入cmd，输入nvidia-smi查看GPU使用情况
可以看到有几个运行的进程，输入 taskkill -PID 
				引发pytorch：CUDA out of memory错误的原因有两个：
1.当前要使用的GPU正在被占用，导致显存不足以运行你要运行的模型训练命令不能正常运行
解决方法：
1.换另外的GPU
2.kill 掉占用GPU的另外的程序（慎用！因为另外正在占用GPU的程序可能是别人在运行的程序，如果是自己的不重要的程序则可以kill）
命令行中输入以下命令，可以查看当前正在GPU运行的程序：
nvidia-smi
再根据上面显示的正在运行程序的PID，输入以下查看进程的命令，可以查看到进程的相关信息，包括