Microsoft Research holds Data Science Summer School in New York with real-world hopes

Kareem Anderson

Microsoft Research Latin America Faculty Summit

The division behind creative projects and research such as temporary touchpad tattoo’s, security measures to enhance cloud data transfers, DNA data storage recording and much more, has hosted its third annual Data Science Summer School.

For 2016, Microsoft Research held an intensive eight-week hands-on course which attempted to make use of machine learning algorithms for the first time to predict actual outcomes, as well as tracking over 13 million Yellow Cab taxi rides in New York City to provide a more efficient carpooling system.

According to Jake Hofman, a Microsoft Research instructor, “We’re really hoping to give them a flavor of solving a research problem that hasn’t yet been solved.”

In the first of the two research projects, Airbnb: Predicting Loyalty, a group of diverse student-lead researchers used decision tree learning techniques to help Airbnb tweak specific factors to encourage future bookings or incentivize current host to continue to participate.

AirBnB team left to right: Shawndra Hill (MSR, mentor), Chris Riederer (Columbia, teaching assistant/mentor), Erica Ram (student), Louise Lai (student), Jacqueline Curran (student), Kaciny Calixte (student), Fernando Diaz (MSR, mentor), Amit Sharma (MSR, mentor)
AirBnB team left to right: Shawndra Hill (MSR, mentor), Chris Riederer (Columbia, teaching assistant/mentor), Erica Ram (student), Louise Lai (student), Jacqueline Curran (student), Kaciny Calixte (student), Fernando Diaz (MSR, mentor), Amit Sharma (MSR, mentor)

The group of researchers used their decision tree predictability techniques to weed out patterns of high and low repeat customer probability by tapping into datasets collected by InsideAirbnb. Perhaps, uncovering what may seem like common sense through the use of various predictive models, the research group concluded that incentivizing the stay at higher rated properties would show an almost immediate boost in the return-rate of Airbnb’s first-time users and guests.

The other of the two research projects followed a similar path of using big data to produce an immediate financial and serviceable outcome. Project Fare Share: Flow and Efficiency in NYC’s Taxi System is as told, an effort to use large swaths of tracking data to produce rideshare projections that could save on commuter time and result in savings of more than $8.5 million for the taxi system.

The research project focused on a single month of data, which included more than 13 million rides, for an average of 420,000 trips per day, driven by over 32,000 different drivers.

Unlike a similar project in 2009 that yielded largely inconclusive results, this year’s study zeroed in on longer trips to specific destinations, revealing that large numbers of taxis ferried just a single passenger on various popular commute and transit routes. The study notes that on “weekday mornings around 7 a.m., there are roughly 25 redundant trips from Port Authority to Rockefeller Center that take place every five minutes for the duration of rush hour.”

Fare Share team left to right: Chris Riederer (Columbia, teaching assistant/mentor), Abraham Neuwirth (student), Jai Punjwani (student), Fatima Chebchoub (student), Marieme Toure (student), Ashton Anderson (MSR, mentor), Sid Sen (MSR, mentor), Jake Hofman (MSR, mentor)
Fare Share team left to right: Chris Riederer (Columbia, teaching assistant/mentor), Abraham Neuwirth (student), Jai Punjwani (student), Fatima Chebchoub (student), Marieme Toure (student), Ashton Anderson (MSR, mentor), Sid Sen (MSR, mentor), Jake Hofman (MSR, mentor)

Both research projects as well as the annual Data Science Summer School host by Microsoft Research are just another great effort to promote diversity in both computer science as well as technological interpretation.

To find out more about the projects, the men and women behind it or their conclusions in greater detail, visit either the Microsoft Research Blog or check out their source code at GitHub