Bias in Search Engine: the Case of Google and a Workshop Solution

. The search engine (SE) is a senseless artificial program. SE matches the user's information demands with the input information and then provides an ordered list of answers. However, the outputs are frequently subjected to bias, which can affect the depiction of issues like gender inequality. Studies have shown that search engines may unconsciously inherit biases from their creators and users throughout their life cycle. In this paper, focused on Google as our research case, we evaluate and summarize different factors that can lead to the bias issue. The factors are depicted in computer science social domains. And in response to these causes, we propose a workshop idea to raise awareness of the problem of search engine discrimination, especially regarding gender issues. Based on our current workshop solution, we also list some potential improvements.


Introduction
In 1990, Archie, the first Internet search engine has been developed, where information on sites was not indexed. After 8 years, mainly aiming to market search services, Google was launched. A patented algorithm called PageRank [1], which aids in ranking online sites that match a particular search term, played a significant role in the rise to prominence of Google. Even today, according to statistics, Google sites were the most popular multi-platform online assets with little over 270 million U.S. visitors as of January 2022 [2]. People frequently invest a lot of time in search engines (SE), if we see the Google trend, which is a sample of real Google search requests that is mainly unfiltered, we can get a visualized view of how a huge number of changes, behind the figure, how many people are clicking in a real-time.
To discover why so many people would invest time in SE, there is the correlation with Artificial Intelligence (AI), which is typically perceived as a neutral objective tool contributing to its position as absolute power. Heavily relying on AI to function, the SE platform dominates society with its 'mask of fairness' as a source of certainty that is believed to bring about authentic truth. Unquestionably, SE disseminates a patriarchal normative order and long-established gender discourse As the US government statistics show, 20 percent of women workforce are employed in the IT industry as software developers while few are in a leadership position. From a social perspective, information on search engines is limited to mainstream ideology about classification schemes due to a lack of diversity in the industry which attaches human knowledge to history. The search engine then turns into a representation of normative social order conceals such unequal power distribution from the public. Thus, innovations have developed into products of history that unintentionally code the past into the future through its repeated computational power [3].
As a consequence, individual users unquestionably fall in consensus with search engine 'misrepresentation' which indeed reproduces the taken-for-granted rationality as part of the presence. To break down the myth of search engines as an objective hegemony, an educational workshop is organized to look into how the inherited gender-related expectations are obscurely implemented in the algorithm with the purpose of raising social awareness to seek future adjustments from technology dominance.
To fill the gap, this paper analyzes the possible causes of gender bias and provides an idea of a workshop that could be used to solve the problem. Using Google as an example, we first summarize and analyze the possible causes of bias through an interdisciplinary approach. Then we demonstrate the idea of a workshop, which aims to make people aware of the problems with search engines and to take the initiative to solve them.

Factors cause bias
To meet a user's requirements, a search engine should locate and filter the most relevant information corresponding to a user's search query, and then provide that information to the user. These requirements are generally in terms of several keywords entered by the user. It seems neutral since the whole process of each search is executed automatically without human intervention. However, negative reports about bias in search engines occur to the public occasionally over the past few years. The causes of search engine bias are still mainly due to the algorithms integrated.
This section first introduces the development of the search engine (Sec. 2.1), and then shows Google's algorithm bais (Sec. 2.2). Finally, we demonstrate the social and other factors (Sec. 2.3).

An Overview development process
In the early stages of search engine development, search engines made decisions based on the strength of relevance between information. To measure the strength of the correlation between information, information is modeled based on the principles of information theory [4]. But these models do not give a completely realistic picture of the strength of the information correlation. They are only approximate estimates to facilitate quantitative calculations in engineering. Similar estimates do not only exist in the models. If we call the links between information entities distances, then the formulae for measuring the length of these distances are also estimates. Despite the efforts to get estimates as close to the true value as possible, anomalies inevitably occur. For example, because not enough keywords are matched, the truly valuable information is overwhelmed by other spam that matches more keywords. Or if the wrong combination of keywords causes the search engine to come up with completely irrelevant answers. In summary, these technically unavoidable errors are part of the reason for search engine bias.
Search engines not only consider the relevance of the information to each other but are also developed in a more personalized way in painting a portrait of the user by analyzing the various search habits of the user [5].
Personalizing a search engine involves adding the user's interests to the retrieval process [6]. In the personalized system, the user profile is constructed. When the user enters query keywords, the system generates more personalized expansion words which can assist search engines in retrieving information for a user based on his or her implicit search intentions [7]. However, the tailored integration may offer a novel technique of enhancement. There are numerous obstacles like a huge dataset being required and computational complexity which lead to loss of complicity for portrait drawing [8]. These factors can be all sources and opportunities for misrepresentation and bias generation.

Case on Google's algorithm bias
To give a piece further understanding that algorithms could take original bias, and could be the initial factors that lead to bias, as the focus case here, we will take a discussion on Google's algorithms and models. The PageRank Algorithm (PA) in Google lies over 200 ranking factors [9]. The Google search algorithm aggregates PageRank scores and delivers results based on the scores of the requested sites.
To determine the page score, one must consider the "surfer" that the user may choose from any page [10]. By using a random surfer model, which serves as the foundation for the PageRank algorithm and determines an appropriate score for each website in order to mitigate the possibility that all links contribute to a page's authority signals. The model attempts to best depict the behavior of website users and determines the chance that a random person will visit a webpage [10]. However, there is a potential case that can lead to the page from where there are no outgoing links.
Apart from PA, and also to deal with the random surfer, Google also uses mathematical models, the Markov chain model and Hidden Markov model [11] to predict the behaviors and transitions from one state to another on a state space. Using the random walk of the theory in this model, and relying on the mathematical assumption and induction, a PageRank evaluation experiment is proposed. Take a simple primitive matrix to analyze, based on the main idea if a matrix is primitive, every node will converge to a particular value. This is of the utmost importance since it implies that each page's rank will ultimately settle on a fixed value regardless of how many times the procedure is done [12]. As a result, the matrix used by PR in Google is imprimitive, which can show the algorithm is not completely fair thus implying some bias in the algorithm itself.

Social and other factors
The problems and Google's case we mentioned above are not all of the shortcomings lying in algorithms parts of the search engine itself. It could even be argued that the existence of these problems has contributed to the development of search engines. Search engine as an objective fair machinery, it is structured to intimate the human brain to enable itself with a capacity of making decisions on its own based on the inserted data. The pre-existing biases in data are structured, rectify the collected data and create a form of efficiency. Although the biases within search engines cannot be completely eliminated, they are being reduced little by little.
The core purpose of a search engine as a commercial product is still to generate commercial value. However, some companies engage in this practice in ways that are not justified and may even be detrimental to the users' interests. Google, for instance, has been known to manipulate the logic of its algorithms to rank its products higher in search results. This practice undermines the neutrality of search engines significantly. It transforms the unconscious behavior of machines into a manifestation of human bias.
On the other hand, in the development stages under the business background, the user profile is developed with a limited target audience which is mostly attached to the mainstream majority. Data reflects the society as it is now which is deployed by the developer teams with no intention of refreshing the pre-existing biases but is subject to the power of stakeholders' interests [3] "the assumption of culture is not given in any society but is socially constructed and manipulated by particular groups with the economic and political power to do so, and those who draw on 'natural' features of society to explain its culture are subconsciously disguising the 'constructed' nature of society".
There is an unnoticed bias in this particular society, which is inherited and reproduced by search engine-like technology. So the bias is hidden by the search engine. Through the reproduction of search engines, a certain type of prejudice can be spread more deeply.
In fact, the hidden prejudices in the society described above are in fact influenced by the mainstream culture. It is an important part of the social fabric. The influence of the mainstream culture goes far beyond the flaws of the algorithm itself. This is because it is difficult to be aware of prejudices. To reach a conclusion, a multi-factorial problem requires multi-disciplinary measures to solve it. We can address the challenge of search engine bias from the fields of computer science and social science.

Workshop Solution
To raise the awareness of participants that the discrimination in participants themselves as well as the search engine, we proposed a workshop solution, which contains 3 main sessions, IAT test (Sec. 3.1), Worldcloud (Sec. 3.2), and role play (Sec. 3.3).  Greenwald in 1998. It is a computerized categorization task that measures the closeness of automatic associations between two types of words (conceptual words and attribute words), using reaction time as an indicator, and subsequently measures implicit social cognition such as the individual's implicit attitudes [13]. This test is based on a physiological model called the neural network model. The model assumes that information is stored in a series of nodes of neural connections organized hierarchically according to semantic relationships. The connection between two concepts can thus be measured by measuring the distance between them on such neural connections.

Purpose of IAT.
This test is used to assess the degree of subconscious bias in the participants' minds towards gender. Asking whether a person tends to be sexist is hardly productive with conventional questionnaires. The key reason for this is that such questions are extremely sensitive, and it is difficult for the subject to confront their subconscious answers and consciously alter the results. Therefore, the results obtained conventionally are not reliably authentic. The IAT takes an indirect and clever approach rather than asking for answers directly. The core principle is based on the assumption that the longer the response time to an idea, the longer the mental processing, and the greater the difference between that idea and the subconscious one. By cleverly setting up the task, test takers can obtain quantitative attitudebehavior consistency results. Thus, although the test taker's answer may not be what he thinks, we can still infer what he thinks by the reaction time. Although in theory, IAT can give a quantitative result, the test itself still has non-negligible flaws. The key to testing is how each question is designed. Participants are likely to infer the appropriate answer from the questions themselves and the connections between them. Such a test would be no different from a traditional direct questionnaire.
In short, we want the participants to be aware of their own biases in this session. Not only that, but the participants will also realize that these biases are not easily detectable. These biases may have become habitual or common knowledge in everyday life, which in turn forms what is called the mainstream culture. Protected by the mainstream culture, these biases influence all aspects of society, and the search engines we use are no exception. Seemingly neutral programs can inherit biased code from their developers. And systems such as search engines, which improve from user feedback, can also be influenced by biased users.

IAT and Result Analysis.
In the first part of the workshop, participants take an IAT which focuses on gender bias. The test takes the form of an online questionnaire and participants will be able to access the given website using an electronic device such as a mobile phone. The results of the test are available immediately after the participants have completed the questionnaire independently within the time limit.
Once the majority of participants have completed the test and received their results, the workshop facilitator will briefly explain the rationale for the test and collect the results. After completing the statistics, the facilitator can carry out a brief analysis based on the actual situation.

Session 2 -WordCloud
At a glance: Participants will start to think about the reasons behind different answers and learn to summarize and extract ideas using the concept of abstraction Here we give these 2 examples and take these questions that will be normally searched in search engines into consideration. After participants have their own answers, they will post them on the WordCloud-based co-editing platform that will represent a visual depiction of words according to the greater size and frequency [14]. After all the participants got the answer, the distribution will be shown. After seeing the distribution, participants could find how their answers vary and how that's different from others, we expect to let our participants be a little bit confused and start to think about why they have different answers.

Group Discussion.
In the group discussion, participants will be divided into small groups. Based on the questions we asked in 2.1, participants need to extract a single answer after the discussion. During the discussion, participants can try to convince others about their own thoughts. By convincing and ranking ideas, participants extract their own idea and the most important parts. This is an abstracted simulation of the process of the search engines, just the way search engines determine the representations of the content is based on algorithms like indexing and ranking. This step also reveals the ideation concept of abstraction, during the discussion for better convincement and to get a single answer, participants will try to tuck away the complexities of their arguments and try to summarize and generalize ideas.

Explore Search Engine.
After the group discussion, participants will be led to the search engines. As in the proposed case here, they will be led to Google for answers searching for the same question. Through the answer, they will try to compare individual and group outcomes they answered in session 2.1 and session 2.2. All the individual and group parts have been carefully considered by participants and they even tried to convince others. So, moving here, we expect participants to think about whether search engines, here Google could always give the answers they really want.

Abstraction Activity.
After the search engine part, participants will take a short discussion about abstraction, the concept of abstraction itself is abstract enough that might not easily understandable, to make an easier understanding, a simple concept will be introduced and participants will have a better understanding through some simple concept mapping activities [15]. Moreover, since in programming, abstraction is an important concept, the search engine is also a programming product. There is a leading question to let participants make comparisons and think about how the search engine abstracts selection into a single idea using abstraction. To make a conclusion to the previous events, it's a frame for them to have a novel mindset.

Session 3 -Role Play
At a glance: Participants will be exposed to an 'artificial' environment of IT industry to act the process of bias reproduction Duration: 20 minutes Techniques: Scenario building and Role-play Purpose: To show participants how answers vary when searching for something, to simulate the way of abstracting many ideas into a single idea Purpose: To bring the theory into presence by getting participants to "act" identities in a power dynamics of consumer, programmer and leadership Outcomes: e. Equip mindsets of the invisible process behind biases in technological products f. Simulate the real environment of IT industry for a future contribution 3.3.1. General Outline. This session aims to restore the present IT industry system to grasp the reasons why such a widely believed 'objective' search engine is bias oriented indeed. We divide all participants into three groups who are usually involved in the process of product development and circulation. One group is categorized as clients which are the agents seeking for professional technological help to maximize benefits. The other two groups are a corporation-based division of labor embodied by the leadership who mostly takes the initiative to set a collective agenda for the whole organization to work on. Moreover, the developer is the main workforce in production who applies their professions to implement any requests from the leadership and clients. We want the participants to take on a respective identity and act upon it as if they are in their positions. On the one hand, clients' intentions will mostly affect the entire production whose demands sometimes are beyond the companies' capacities. Thus, the leadership will negotiate with the clients to shift the project on the basis of interests and conduct the project within their limits. However, the developers are the ones who are usually alienated from their IT products as there has no room to make any difference on the settled narrative that is agreed upon by leadership and clients in reality.

Reasons of bias production.
During the role-play, we can clearly see the conflict between clients and corporations over maximized commercial interests. However, both of them subconsciously make profits from the status quo so as to attract individual users of their products. Technologies show up with a 'mask of fairness' that can be used to present objectivity and to nurture social justice. The algorithms behind digital capitalism is bias-filled and invented by the dominating value of 'power-knowledge' in relation to the majority culture and a male-centered gender regime [16]. This activity brings an authentic experience for the participants to experience the production which can pave the way for them to relate their unique contexts for a more freasbile individual future action.

Result Analysis.
Role-play aims to provide the participants a real picture of the product development process. Since the mainstream culture has legitimized certain narrative as natural being and blinded each individual actor from questioning. Participants are expected to gain a refreshing perspective from the structural beliefs through the prepared activities.

Solution conclusion and future improvement.
The current workshop idea has originally been discussed in The University of Tokyo's "Global Unit Courses" (UTokyo GUC). We as members of the educational approach team, need to raise a social issue and then discuss the solution. After the previous discussion, based on the comments given by Prof. Yuko Itatsu, we revise our workshop ideas related to the topic and the modification and finalization.
After 3 sessions' workshop, participants could get aware of their unconscious biases, know about abstraction concepts, and learn to think about power distribution and different structures of problemthinking methods. There are also improvements for this workshop, we can establish a deeper connection between each separate part. And we need to rearrange other activities that make it easier for participants to understand the concepts.
Moreover, we could design and model our workshop so that the format of the workshop could be standardized and propagated.

Conclusion
In this paper, we represent the bias issue lying under search engines. We analyze and summarize many aspects that can contribute to the bias in computer science and social domains, among which we focus on Google as our resource analysis case. In response to these issues, we offered an awareness-raising workshop idea. To get a summary of the workshop idea, the workshop design matches the basic requirements and can deliver the fundamental of the search engine, by participating in this workshop participants could start to think and analyze their unconscious biases and could also observe the underlying biases in search engines.