MXNet支持分布式訓練,可以通過Horovod或Parameter Server來實現。
使用Horovod進行分布式訓練的步驟如下:
pip install horovod
import mxnet as mx
import horovod.mxnet as hvd
hvd.init()
train_data = mx.io.ImageRecordIter(...)
train_data = hvd.DistributedDataLoader(train_data)
net = mx.gluon.nn.Sequential()
net.add(mx.gluon.nn.Dense(128))
net.add(mx.gluon.nn.Activation('relu'))
net.add(mx.gluon.nn.Dense(10))
net.initialize()
opt = mx.optimizer.SGD(learning_rate=0.1)
opt = hvd.DistributedOptimizer(opt)
with mx.gluon.utils.split_and_load(data, ctx_list=hvd.local_devices()):
...
使用Parameter Server進行分布式訓練的步驟如下:
pip install mxnet
import mxnet as mx
from mxnet import kv
num_workers = 2
ps = kv.create('dist')
net = mx.gluon.nn.Sequential()
net.add(mx.gluon.nn.Dense(128))
net.add(mx.gluon.nn.Activation('relu'))
net.add(mx.gluon.nn.Dense(10))
net.initialize()
opt = mx.optimizer.SGD(learning_rate=0.1)
opt = kv.DistributedOptimizer(opt)
with mx.autograd.record():
...