Microsoft just announced the adoption of Z-Code Mixture of Experts to its Azure Translation Cognitive Service. This is the first publicly available service on Azure to leverage this approach.
What is Mixture of Experts?
Z-code models employ a new design known as Mixture of Experts (MoE), in which different elements of the models can learn to do different tasks. The models learn to translate between numerous languages simultaneously. The Z-code MoE model employs additional parameters while dynamically determining which parameters to apply to a particular input. During training, the model might specialize a subset of the parameters (experts). The model uses the appropriate experts for the task at runtime, which is more computationally efficient than using all of the model’s parameters.
It is difficult to train huge models with billions of parameters. The Translator team worked with Microsoft DeepSpeed to create a high-performance system that assisted in the training of enormous scale Z-code MoE models, allowing them to efficiently scale and deploy Z-code models for translation.
To make this possible Microsoft partnered with NVIDIA to optimize engines that can be used at runtime to deploy the new models which yielded a substantial 27x improvement over the previous implementation.
To read more about this Microsoft Translator and Z-Code Mixture of Experts, head over to the official announcement post from Microsoft.