Search Results

1-bit Stochastic Gradient Descent (1-bit SGD)

1-bit Stochastic Gradient Descent is a technique from Microsoft Research aimed at increasing the data parallelism inherent in training deep neural networks. They describe the technique in the paper 1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs.

They accelerate training neural networks with stochastic gradient descent by:

  1. splitting up the computation for each minibatch across many nodes in a distributed system.
  2. reducing the bandwidth requirements for communication between nodes by exchanging gradients (instead of model parameters) and quantizing those gradients all the way to just 1 bit.
  3. they add the quantization error from Step 2 into the next minibatch gradient before quantization.

1-bit Stochastic Gradient Descent is available is a technique in Microsoft’s Cognitive Toolkit (CNTK).