Search Engine Land’s Jason Barnard continued his Bing Series with the second of five interviews with the technical team leads at Bing. Part 2 introduces Fabrice Canel, Bing’s Principal Program Manager who takes us on a deep dive into Bingbot where we get an inside look at how it discovers, crawls, extracts and indexes data and makes it available for Bing searches.
Bingbot vs Googlebot
As mentioned in the first article, Jason Barnard’s look at Bing was partially driven by his desire to learn more about how Google’s search engine works. During the first interview, we took a deep dive into blue links and the algorithms that are used to create the Search Engine Results Page (SERP). In this interview, we go further under the hood to understand how Bing obtains and uses the data to make it available in Bing search. Barnard suggests that it is safe to assume that Googlebot works in a similar way to Bingbot:
- The process used is the same: discover, crawl, extract and index
- The content they are indexing is the same
- The problems faced by both search engines is the same
- The technology they use is the same
Because of those similarities, and the fact that Microsoft is now collaborating on Chromium and standardizing crawling and rendering makes Canel’s thoughts insightful for anyone who wants to understand how search engines work. For anyone interested in trying out Bingbot for their site, you can visit Microsoft’s Bing Webmaster tools to learn more.
As we discover in Barnard’s interview, the foundation for any search engine and the data it uses is based on the following process: discover, crawl, extract and index. While it seems obvious, Barnard paints a picture in his article that demonstrates something many search marketers already know: Bing’s process rewards well-structured and well-presented content and allows it to rise to the top in a mechanical way.
Discovering and Crawling
The process of discovering and crawling data is fascinating and complex. Barnard mentions that on any given day Bingbot finds more than 70 billion URLs that it has never seen before. During this discovery phase, Bingbot will follow all the links it finds and also continues to crawl every resulting page since. Fetching is important as Bingbot has no idea if the content is useful for Bing users. Sounds complicated? It is mostly a matter of finding new pages and verifying what is on those pages.
During the interview, Canel shares some details on how prefiltering content works. It’s easy to imagine that no system can realistically consume 70 billion URLs a day so instead, Bingbot focuses on creating efficiency to save money, help reduce carbon emissions, and ultimately generate better results. In order to do this, Bingbot prefilters content to help identify if the content can give value or not looking at things such as URL structure, length of the URL, the number of variables, inbound link quality, and so on.
Barnard summarizes the extraction of data as one that comes down to a few simple facts. Unless you are a major company with a large budget, Bingbot and Canel suggest that sticking to a popular template on a common CMS (Joomla, Drupal, etc) will often be a good choice for most companies and websites because they are common: they will be natively understood by all search engines, and they’re also very easy for the bots to extract necessary data. The key to that is that your content should be unique, and you can completely change the visual presentation using simple CSS.
Indexing and Storing
Barnard points out that the way Bingbot stores information is critical for all the algorithms, with everything relying on the quality of Bingbot’s indexing. Canel suggests that the key to this is annotating the data that they store. For example, they add a rich descriptive layer to HTML and label the parts: heading, paragraph, media, table, aside, footer, and so on. In doing so, the data can be easily consumed and placed on the SERP (Search Engine Results Page).
Handles and Content
Canel points out that if a website’s HTML is using a known system like easy-to-identify handles, then labelling is more accurate and usable for the different elements. Canel references Cindy Klum’s Fraggles story and admits that HTML “handles” on your content will give it a head start and make it easier for the algorithms to use in their candidate sets.
For a deeper dive into Bingbot be sure to check out Search Engine Lands article to learn more about how the role algorithms and machine learning play. Next week, Barnard will release part 3 of his series focusing on how the Q&A/Featured Snippet Algorithm works in his interview with Ali Alvi, Principle Lead Program Manager AI Products, Bing. Let us know what you think and what you would like to see in future Bing articles.