Microsoft makes its Distributed Machine Learning Toolkit open source

Michael Cottuli

If you’re a tech geek, there’s a good chance you’ve spent many a long night thinking about artificial intelligence, or, at least, machines that are able to learn new things and apply that knowledge to tasks. This concept is appropriately named machine learning, and it’s a process that the team over in Microsoft Asia’s research lab has decided to offer up to the public.
The tools used to develop machine learning, known as the Distributed Machine Learning Toolkit (DMTK) have been put out on GitHub, and made entirely open source. If you have any interest in utilizing the technology used to develop machine learning, you can now make use of the tools used by Microsoft’s best and brightest. Several new components will be added in the future, but right now, the DMTK has rolled out with these:

  • DMTK framework: A parameter server, which supports storing a hybrid data-structure model, and a client SDK, which supports scheduling client-side, large-scale model training and maintaining a local model cache syncing with the parameter server side model.
  • LightLDA: A new, highly efficient algorithm for topic model training that can process large-scale data and model even on a modest computer cluster.
  • Distributed Word Embedding: A popular tool used in natural language processing, the toolkit offers the distributed implementations of two algorithms for word embedding: The standard Word2vec algorithm and a multi-sense algorithm that learns multiple embedding vectors for polysemous words.

If you’re the sort of person who can sink their teeth into software like this, you should absolutely pick up the software on GitHub. The team at Microsoft Asia is hoping that, by putting this software out in the open, they’ll be able to work with more researchers and practitioners to improve their pool of data, and try to make the DMTK applicable to more applications.