ÌÇÐÄÊÓÆµ

November 13, 2015

Microsoft open sources Distributed Machine Learning Toolkit for more efficient big data research

× close

Researchers at the Microsoft Asia research lab this week made the Microsoft Distributed Machine Learning Toolkit openly available to the developer community.

The , , is designed for distributed —using multiple computers in parallel to solve a complex problem. It contains a parameter server-based programing framework, which makes machine learning tasks on big data highly scalable, efficient and flexible. It also contains two distributed machine learning algorithms, which can be used to train the fastest and largest topic and the largest word-embedding model in the world.

The toolkit offers rich and easy-to-use APIs to reduce the barrier of distributed machine learning, so researchers and developers can focus on core machine learning tasks like data, model and training.

The toolkit is unique because its features transcend system innovations by also offering machine learning advances, the researchers said. With the toolkit, the researchers said developers can tackle big-data, big-model machine learning problems much faster and with smaller clusters of computers than previously required.

For example, using the toolkit one can train a topic model with one million topics and a 20-million word vocabulary, or a word-embedding model with 1000 dimensions and a 20-million word vocabulary, on a web document collection with 200 billion tokens utilizing a cluster of just 24 machines. That workload would previously have required thousands of machines.

Get free science updates with Science X Daily and Weekly Newsletters — to customize your preferences!

In addition to supporting topic model and word embedding, the toolkit also has the potential to more quickly handle other complex tasks involving computer vision, speech recognition and textual understanding.

Specifically, the toolkit includes the following key components:

In the future, the researchers said, more components will be added to new DMTK versions. Microsoft researchers are hoping that, by open sourcing DMTK, they can work with machine learning and practitioners to enrich the algorithm set and make it applicable to more applications.

More information: More information about the DMTK is available here:

Provided by Microsoft

Load comments (1)

This article has been reviewed according to Science X's and . have highlighted the following attributes while ensuring the content's credibility:

Get Instant Summarized Text (GIST)

This summary was automatically generated using LLM.