Maluuba, a Canadian AI development startup that Microsoft bought out, has developed a new way for artificial intelligence to interact with data to simulate human capabilities. Using the classic 1980’s version of Ms. Pac-Man, their ‘reinforcement learning’ taught AI how to not only play the game, but win based on a divide-and-conquer method with the maximum score possible without breaking the game: 999,990.
Of course, the achievement far surpasses that of most humans and the team is being revered for their breakthroughs. Mostly because Ms. Pac-Man has been ‘difficult to crack’ says Microsoft. Like most companies, AI development can be best understood through the use of video games to simulate how humans perceive the algorithms. Ms. Pac-Man being built for the arcade was intended to be difficult but not impossible, keeping gamers coming back for more with the hint of doing better next time. The Maluuba team broke up all the ways that Ms. Pac-Man ticked and configured each task across different AI components, encouraging each to do their jobs individually for the larger picture. What ensued was success for science and for gaming.
The method is called Hybrid Reward Architecture. Put simply, when one of the different AI agents were able to be rewarded with gaining a pellet or avoiding a ghost successfully, the higher up manager AI would take their suggestion into consideration. The failed components would not be encouraged.
The top agent took into account how many agents advocated for going in a certain direction, but it also looked at the intensity with which they wanted to make that move. For example, if 100 agents wanted to go right because that was the best path to their pellet, but three wanted to go left because there was a deadly ghost to the right, it would give more weight to the ones who had noticed the ghost and go left.
With the help of Ms. Pac-Man, Maluuba’s ‘reinforcement learning’ is setting a path for further AI research, teaching computers through trial and error. As a result, this could be used to help businesses make more accurate sales predictions and allocate their efforts more efficiently. Or they could tackle more involved games, that’s an option too.