We've all had it - open up our email inboxes, only to be phased by the astonishing amount of spam, both in our junk folders but, also in the inbox itself. Microsoft's Outlook.com web email service works to fight spam with the company's Smartscreen filtering system, a machine-learning algorithm that adapts to prevent spam arriving in your inbox, as well as to protect from malware and other such attacks.
In a post on MSDN, a member of the Outlook.com explains how they run a "Spam Fighters" program, that aims to assist their systems in learning what is - and isn't - spam. The way it works is fairly simple - an email message that Smartscreen would normally filter out will be sent to you. When opening this message, it will ask you to look at the message and choose whether it is junk or not. The buttons act as a 'vote' and tell the system whether it was initially correct in blocking the message, or not.
When you vote, a process takes place to match this up and determine certain characteristics, such as the IPs sending spam:
- First, your choice of spam or non-spam is record for that particular message
- Your choice is then compared to what the spam filter said for that message when it was scanned through the filter:
a) Did you say it was spam and the filter agreed? Then everything is good
b) Did you say it was non-spam and the filter agreed? Then everything is good
c) Did you say it was spam but the filter said it was non-spam? Then we have a false negative (missed spam)
d) Did you say it was non-spam but the filter said it was spam? Then we have a false positive (good email classified as spam)
- Your vote is then compared against the votes of all other users receiving similar email. Does everyone overwhelmingly agree with you? Or disagree with you? Or are the votes split up?
These votes from all the users across the entire Spam Fighters program are combined, and the messages combined to create a corpus, and then Smartscreen learns across numerous features within a message – sending IP, sending domains, authentication status, headers, body of message, attachments, encodings, and so forth. This feeds into our IP reputation, and into the Smartscreen spam filtering algorithm. This algorithm is what does the filtering for spam, malware, and phishing as well as legitimate email. It’s updated multiple times per day.
There's no opt-in for this program, as users are chosen randomly to vote on an email message.
Have you been asked to vote on an email as part of the Spam Fighters program? Let us know in the comments!