Tensorflow訓練模型越來越慢的2種解決方案

發布時間：2020-10-03 08:31:32 來源：腳本之家閱讀：382 作者：xdq101 欄目：開發技術

1 解決方案

【方案一】

載入模型結構放在全局，即tensorflow會話外層。

'''載入模型結構:最關鍵的一步'''
saver = tf.train.Saver()
'''建立會話'''
with tf.Session() as sess:
 for i in range(STEPS):
 '''開始訓練'''
 _, loss_1, acc, summary = sess.run([train_op_1, train_loss, train_acc, summary_op], feed_dict=feed_dict)
 '''保存模型'''
 saver.save(sess, save_path="./model/path", i)

【方案二】

在方案一的基礎上，將模型結構放在圖會話的外部。

'''預測值'''
train_logits= network_model.inference(inputs, keep_prob)
'''損失值'''
train_loss = network_model.losses(train_logits)
'''優化'''
train_op = network_model.train(train_loss, learning_rate)
'''準確率'''
train_acc = network_model.evaluation(train_logits, labels)
'''模型輸入'''
feed_dict = {inputs: x_batch, labels: y_batch, keep_prob: 0.5}
'''載入模型結構'''
saver = tf.train.Saver()
'''建立會話'''
with tf.Session() as sess:
 for i in range(STEPS):
 '''開始訓練'''
 _, loss_1, acc, summary = sess.run([train_op_1, train_loss, train_acc, summary_op], feed_dict=feed_dict)
 '''保存模型'''
 saver.save(sess, save_path="./model/path", i)

2 時間測試

通過不同方法測試訓練程序，得到不同的訓練時間，每執行一次訓練都重新載入圖結構，會使每一步的訓練時間逐次增加，如果訓練步數越大，后面訓練速度越來越慢，最終可導致圖爆炸，而終止訓練。

【時間累加】

2019-05-15 10:55:29.009205: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
step: 0, time cost: 1.8800880908966064
step: 1, time cost: 1.592250108718872
step: 2, time cost: 1.553826093673706
step: 3, time cost: 1.5687050819396973
step: 4, time cost: 1.5777575969696045
step: 5, time cost: 1.5908267498016357
step: 6, time cost: 1.5989274978637695
step: 7, time cost: 1.6078357696533203
step: 8, time cost: 1.6087186336517334
step: 9, time cost: 1.6123006343841553
step: 10, time cost: 1.6320762634277344
step: 11, time cost: 1.6317598819732666
step: 12, time cost: 1.6570467948913574
step: 13, time cost: 1.6584930419921875
step: 14, time cost: 1.6765813827514648
step: 15, time cost: 1.6751370429992676
step: 16, time cost: 1.7304580211639404
step: 17, time cost: 1.7583982944488525

【時間均衡】

2019-05-15 13:03:49.394354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 7048 MB memory) -> physical GPU (device: 1, name: Tesla P4, pci bus id: 0000:00:0d.0, compute capability: 6.1)
step: 0, time cost: 1.9781079292297363
loss1:6.78, loss2:5.47, loss3:5.27, loss4:7.31, loss5:5.44, loss6:6.87, loss7: 6.84
Total loss: 43.98, accuracy: 0.04, steps: 0, time cost: 1.9781079292297363
step: 1, time cost: 0.09688425064086914
step: 2, time cost: 0.09693264961242676
step: 3, time cost: 0.09671926498413086
step: 4, time cost: 0.09688210487365723
step: 5, time cost: 0.09646058082580566
step: 6, time cost: 0.09669041633605957
step: 7, time cost: 0.09666872024536133
step: 8, time cost: 0.09651994705200195
step: 9, time cost: 0.09705543518066406
step: 10, time cost: 0.09690332412719727

3 原因分析

(1) Tensorflow使用圖結構構建系統，圖結構中有節點(node)和邊(operation)，每次進行計算時會向圖中添加邊和節點進行計算或者讀取已存在的圖結構；

(2) 使用圖結構也是一把雙刃之劍，可以加快計算和提高設計效率，但是，程序設計不合理會導向負面，使訓練越來約慢；

(3) 訓練越來越慢是因為運行一次sess.run，向圖中添加一次節點或者重新載入一次圖結構，導致圖中節點和邊越來越多，計算參數也成倍增長；

(4) tf.train.Saver()就是載入圖結構的類，因此設計訓練程序時，若每執行一次跟新就使用該類載入圖結構，自然會增加參數數量，必然導致訓練變慢；

(5) 因此，將載入圖結構的類放在全局，即只載入一次圖結構，其他時間只訓練圖結構中的參數，可保持原有的訓練速度；

4 總結

(1) 設計訓練網絡，只載入一次圖結構即可；

(2) tf.train.Saver()就是載入圖結構的類，將該類的實例化放在全局，即會話外部，解決訓練越來越慢。

以上這篇Tensorflow訓練模型越來越慢的2種解決方案就是小編分享給大家的全部內容了，希望能給大家一個參考，也希望大家多多支持億速云。

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

Tensorflow訓練模型越來越慢的2種解決方案

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

Tensorflow訓練模型越來越慢的2種解決方案

猜你喜歡

最新資訊

相關推薦

相關標簽