BigDaddy Means Big Changes at Google
One of the most popular forms of exercise among many search engine optimizers—both the third-party firms that do it for others and the advertisers who spiff up their own Web pages for better natural search rankings—is a periodic workout called “chasing the algorithm”. The race begins when Google or Yahoo! updates some portion of the software that determines how they look at Web pages and decide which are most relevant and valuable to a searcher. The engine makes that change; Web operators see their rankings rise or fall as a result; and they, or their outside search engine optimization (SEO) firm, scramble to get back the old rank by providing the new elements the search engine now needs. After a few months, the engines make another change, and it’s off to the races again.
Well, optimizers on Google are lacing up their running shoes for another race. Only this one promises to be more a marathon than the usual sprint. Google is testing a new data center infrastructure, a feat much bigger and comprehensive than an algorithm change. Dubbed “Big Daddy” both in the search marketing blogs and forums and by the friendly folks at Google, this new data center—still in shakedown mode—will reportedly add new ground-level capabilities into the Google search function and drive those powers deep into all the algorithms with which Google searches, studies and indexes the Web.
First, a bit of big-picture talk. Google’s examination of the Web relies on a global network of data centers with different IP addresses. These decentralized servers speed the job of sending specialized Google services to users in different regions; they also share the workload of spidering the Web and comparing those discoveries to Web pages that are already in Google’s index.
The new BigDaddy data center contains new code for examining and sorting the Web, and once it has been tested fully, will become the default source for Web results, according to Google’s chief search engineer Matt Cutts. In a January 4 post on his blog, Cutts said that might happen in early February or March of this year.
But what is BigDaddy intended to do? According to Rob Sullivan, head organic search strategist at search marketing firm Enquiro, “If an algorithm update is like putting new tires on a car or installing a new stereo system, this BigDaddy is like putting in a whole new motor. They’re totally revamping how Google works and resolving some long-standing issues with getting sites indexed properly.”
One of those issues is “canonicalization”. That’s a fancy Google word for instructing a search engine how to decide which of a series of related URLs is the proper one to insert into the Google index. Say your Web site has a number of different home page URLs, including “stuff.com”, “www.stuff.com”, www.stuff.com/index.html” and “stuff.com/home.asp”. This can come about because Web servers are often set up to accept aliases for Web pages, and to know that a request for “stuff.com” means someone’s looking for “www.stuff.com”. That’s a concession to users who get tired of getting error messages when they don’t type in “www”.
The problem is that while these URLs may pull up the same page content, they’re technically four different pages. That could skew the page count Google gets for the Web site, so that a site with 1000 pages and two aliases per page might look twice its real size to Google.
It’s also possible that those aliases could inadvertently contain different content or different incoming links. In that case the Google index, which looks at the value of the content and the quality of the links, could give those four pages different rankings.
Finally, a Google search that turns up multiple entries for what is essentially the same content makes the results page that much less valuable to users. Better to select one of the URLs as the most representative and make room for other results.
“If you want to go to the Seattle Seahawks page on the NFL Web site, you’ll get this long, horrendous URL,” Sullivan says. “But the site also has another URL that’s just ‘Seattle Seahawks’. It pulls the content from the first page and just displays it under a prettier URL. So Google wants to be able to say that second page is the one people really want, and they’ll attribute all the traffic, links and value to the shorter URL.”
BigDaddy is also intended to provide a solution to another long-standing Google problem: that of illicit redirects, known as “302 redirects”. Nefarious Webmasters can “hijack” a page by replacing the pages that should come up in a search with a virtual page that masquerades under the URL for the correct page. The searcher sees the correct result, but when clicked on, the listing can redirect the searcher to any page the hijacker wants—including adult content or false storefronts set up to capture personal information. If a Web site suffers enough hijackings, Google will consider all the pages contaminated and drop it from the index.
“302 redirects are a big hole in the system,” Sullivan says. “People are using 302 redirects to hijack content and pages and many other things. By fixing this, Google will be eliminating a lot of problems.”
Of course, how BigDaddy will fix these issues is a closely held secret. As with many other questions surrounding the compiling and ranking of its index, Google refuses to be specific for fear that too much information will only teach the bad guys how to get around the system.
And there’s something else new about BigDaddy. While search optimizers often know where to find a Google testing data center and have usually tried to go there to see how the pages they’re working on are being searched and indexed, those IP addresses change often, even in a day.
But for BigDaddy, Google’s thrown open the doors. In early January, Cutts published a pair of IP addresses (66.249.93.104 and 64.233.179.104, for those who want a look) and actively called for feedback from Webmasters about problems and issues they perceived with the new system and its indexing.
Some of these changes will bring Google’s indexing technology up to par with its competitors; for example, Yahoo! and MSN have been handling 302 redirects for a year or more, although perhaps not as effectively as BigDaddy will eventually do. But other aspects of BigDaddy will help position Google to measure up to the search requirements of the future in some interesting ways, Sullivan says.
“This will lay the groundwork for more advanced algorithms, larger databases, and being able to index different types of content more effectively,” he says. For example, Google has also begun using a search crawler built on a Mozilla browser. The new search bot is more flexible, seems faster and can read non-text content more readily; that should mean that in time, it will be able to read links within images and even within Flash video, matter that gets ignored by bots that can’t speak Javascript.
“As Web technology develops and we get richer and more interactive Web sites, [the search engines] can’t just stick with just indexing hyperlinks and text,” Sullivan says. “They’re going to have to do everything.”
Want to use this article? Click here for options!
© 2010 Penton Media Inc.
Acceptable Use Policy blog comments powered by Disqus









