Microsoft researchers study lung cancer risks by analyzing web search data

Microsoft is searching for preventative measures to battle cancer with the help of its machine learning technology. Throughout the second half of the year, we’ve reported about Microsoft Research using search query and anonymized data to link potential risks, health symptoms, and the┬áprobability of diagnosis to pave a path for healthcare professionals.

With 20% of lung cancer diagnoses made in non-smokers, Microsoft Research is determined to assist in finding those nontraditional causes that are plaguing the population. They’ve used a┬ásimilar algorithm to the research in Pancreatic Cancer, noting that approximately 5 million anonymous searchers were examined in this particular study. Apparently, this model can identify 1.5% -40% people nearly a year in advance before they confirm a lung cancer diagnosis. Even though it’s been tweaked to prevent false positives, that’s still a significant amount of time that could potentially save a life.

Eric Horvitz, the Managing Director of Microsoft Research in Redmond, co-authored a publication for JAMA Oncology with their findings.

The statistical classifier predicting the future appearance of landmark web queries based on search log signals identified searchers who later input queries consistent with a lung carcinoma diagnosis, with a true-positive rate ranging from 3% to 57% for false-positive rates ranging from 0.00001 to 0.001, respectively. The methods can be used to identify people at highest risk up to a year in advance of the inferred diagnosis time.

The results also explained that five key factors that were found to lead to lung cancer via search queries included family history, age, radon levels, primary location, and occupation. The reason that evidence of smoking wasn’t on the list? Because it’s hard to determine the user’s smoking preferences from just a search history.

Microsoft Research believes that these queries may not be able to find specific causes on its own. According to the blog post, it is the start of a hypothesis that can lead to more studies and researches in the healthcare industry.

Share This
Further reading: , , ,

How do you feel about your anonymized data being used for improvements in the healthcare industry?