Search Engine Bias and the Demise of Search Engine Utopianism
- Search Engines Make Editorial Choices
- Search Engine Editorial Choices Create Biases
- Search Engine Bias Is Necessary and Desirable
- Technological Evolution Will Make Search Engine Bias Moot
- Conclusion
- References
In the past few years, search engines have emerged as a major force in our information economy, helping searchers to perform hundreds of millions (or even billions) of searches per day. [1] With this broad reach, search engines have significant power to shape searcher behavior and perceptions. In turn, the choices that search engines make about how to collect and present data can have significant social implications.
Typically, search engines automate their core operations, including the processes that the search engines use to aggregate their databases and then sort/rank the data for presentation to searchers. This automation gives search engines a veneer of objectivity and credibility. [2] Machines, not humans, appear to make the crucial judgments, creating the impression that search engines bypass the structural biases and skewed data presentations inherent in any human-edited media. [3] Search engines’ marketing disclosures typically reinforce this perception of objectivity.
Unfortunately, this romanticized view of search engines doesn’t match reality. Search engines are media companies. Like other media companies, search engines make editorial choices designed to satisfy their audiences. [4] These choices systematically favor certain types of content over others, producing a phenomenon called search engine bias.
Search engine bias sounds scary, but this article explains why such bias is both necessary and desirable. I’ll also explain how emerging personalization technology will soon ameliorate many concerns about search engine bias.
Search Engines Make Editorial Choices
Search engines frequently claim that their core operations are completely automated and free from human intervention, [5] but this characterization is false. Instead, humans make numerous editorial judgments about what data to collect and how to present that data. [6]
Indexing
Search engines don’t index every scrap of data available on the Internet. Deliberately or accidentally, search engines omit some web pages entirely [7] or may incorporate only part of a web page. [8]
During indexing, search engines are designed to associate third-party metadata (data about data) with the indexed web page. For example, search engines may use and display third-party descriptions of the web site in the search results. [9] Search engines may also index anchor text (the text that third parties use in hyperlinking to a web site), [10] which can cause a web site to appear in search results for a term the web site never used (and to which the site owner may object). [11]
Finally, once indexed, search engines may choose to exclude web pages from their indexes for a variety of reasons, ranging from violations of quasi-objective search engine technical requirements [12] to simple capriciousness. [13]
Ranking
To determine the order of search results, search engines use complex proprietary ranking algorithms. Ranking algorithms obviate the need for humans to make individualized ranking decisions for the millions of search terms used by searchers, but such algorithms don’t lessen the role of human editorial judgment in the process. Instead, the choice of which factors to include in the ranking algorithm—and how to weight such factors—reflects the search engine operator’s editorial judgments about what makes content "valuable." Indeed, to ensure that these judgments produce the desired results, search engines manually inspect search results [14] and make adjustments accordingly.
Additionally, search engines claim that they don’t modify algorithmically generated search results, but there is some evidence to the contrary. Search engines allegedly make manual adjustments to some web publishers’ overall rankings. [15] Also, search engines occasionally modify search results presented in response to particular keyword searches. Consider the following instances:
- Some search engines blocked certain search terms containing the keyword phpBB. [16]
- In response to the search term Jew, for a period of time (including, at minimum, November 2005, when this author observed the phenomenon), Google displayed a special result in the sponsored link, saying "Offensive Search Results: We’re disturbed about these results as well. Please read our note here." The link led to a page explaining the results. [17]
- Reportedly, Ask.com blocked search results for certain terms such as pedophile, bestiality, sex with children, and child sex. [18]
- Google removed some web sites from its index in response to a 512(c)(3) take-down demand from the Church of Scientology. However, Google displayed the following legend at the bottom of affected search results pages (such as search results for scientology site:xenu.net): "In response to a complaint we received under the U.S. Digital Millennium Copyright Act, we have removed two result(s) from this page. If you wish, you may read the DMCA complaint that caused the removal(s) at ChillingEffects.org." [19]
Conclusion
Search engines have some duality in their self-perceptions, and this duality creates a lot of confusion. [20] Search engines perceive themselves as objective and neutral because they let automated technology do most of the hard work. However, in practice, search engines make editorial judgments just like any other media company. Principally, these editorial judgments are instantiated in the parameters set for the automated operations, but search engines also make individualized judgments about what data to collect and how to present it. These manual interventions may be the exception and not the rule, but these exceptions only reinforce the fact that search engines play an active role in shaping their users’ experiences when necessary to accomplish their editorial goals.